Particle Swarm Optimization: A Comprehensive Survey

Particle swarm optimization (PSO) is one of the most well-regarded swarm-based algorithms in the literature. Although the original PSO has shown good optimization performance, it still severely suffers from premature convergence. As a result, many researchers have been modifying it resulting in a large number of PSO variants with either slightly or significantly better performance. Mainly, the standard PSO has been modified by four main strategies: modification of the PSO controlling parameters, hybridizing PSO with other well-known meta-heuristic algorithms such as genetic algorithm (GA) and differential evolution (DE), cooperation and multi-swarm techniques. This paper attempts to provide a comprehensive review of PSO, including the basic concepts of PSO, binary PSO, neighborhood topologies in PSO, recent and historical PSO variants, remarkable engineering applications of PSO, and its drawbacks. Moreover, this paper reviews recent studies that utilize PSO to solve feature selection problems. Finally, eight potential research directions that can help researchers further enhance the performance of PSO are provided.


I. INTRODUCTION
A lot of engineering applications, such as electrical power systems and signal processing, require an efficient and effective algorithm that can solve their filed-related optimization problems. Real-world optimization problems have been solved by swarm algorithms such as particle swarm optimization (PSO) [1] and ant colony optimization (ACO) [2] as well as other meta-heuristic algorithms including genetic algorithm (GA) [3] and differential evolution (DE) [4].
Generally, most meta-heuristic algorithms can solve many different types of optimizations problems. Nevertheless, these algorithms may have one or more of the following drawbacks:  Having a lot of parameters to be tuned.
 Requiring high programming skills to build the algorithm.  High computational cost.  The need of transforming algorithms into binary forms. PSO was initially introduced by Kennedy and Eberhart [5] in 1995. The PSO algorithm has attracted a lot of researchers in the last decade due to its simple implementation and fewer controlling parameters. The idea and formulation of the PSO algorithm were stimulated from observing the societal behavior of birds flocking and fish schooling. In nature, a swarm of birds flies in the space following a leader who has the closest position to the food. The social behavior of birds can be translated into algorithmic operations, as in PSO, to solve optimization problems where the swarm of birds is interpreted as a swarm of particles and each particle represents a candidate solution. The swarm of particles searches the space in given dimensions and finds the best solution that optimizes the problem at hand. The following points summarize some of the facts that make the PSO algorithm an attractive optimization algorithm:  PSO is simple to implement and code.  PSO has only three controlling parameters (inertia weight, cognitive ratio, and social ratio). A slight change in any of these three controlling parameters results in a different performance as shown in [6] and [7].  PSO is flexible to hybridize with other optimization algorithms. PSO is efficient in controlling the balance between exploration and exploitation. Particles in the exploration phase explore the space extensively while the exploitation phase focuses on promising regions. The more balance between exploration and exploitation, the better the PSO performance.
The abovementioned advantages have made PSO a promising candidate for optimizing a wide variety of realworld optimization problems and applications. In the literature, there have been a few PSO review papers that can be split into two categories: the first category reviews PSO and its applications on a specific field [8,9] whereas the second reviews existing PSO variants [10,11]. Although the article in [10] reviewed recent studies on PSO, the authors considered PSO in continuous search space only whereas PSO in the binary form was somehow excluded. In addition, the authors did not consider several important aspects such as the applications of PSO in optimization problems.
A recent article [11] reviewed the research works carried on PSO but it was limited to binary PSO variants only. Recently, a survey paper on PSO has been published in [12] where several PSO variants in both continuous and discrete spaces are reviewed. However, the article does not include neighborhood topologies as well as the hybridization of PSO with other common meta-heuristic algorithms such as ACO and gravitational search algorithm (GSA). In addition, it is only focused on the application of PSO on solar photovoltaic systems without considering other engineering applications of PSO. Table I summarizes recent and important state-of-the-art PSO survey papers.
The main aim of this paper is to present a comprehensive review of PSO that includes continuous PSO, binary PSO, different PSO topologies, hybrid PSO variants, types of PSO variants (e.g., cooperative PSO and multi-swarm PSO), and the applications of PSO variants in optimization problems. More importantly, this review paper focuses on PSO-based feature selection. To the best of the authors' knowledge, there has been no publication on a comprehensive survey that covers the recent advances in PSO variant developments and the implementation of PSO to solve feature selection problems.
The main contributions of this review article can be summarized as follows: 1-A comprehensive and critical review of PSO and its variants is provided. The limitations of existing PSO variants are identified and some insightful recommendations are provided to overcome these limitations. In addition, clear guidance that includes that the essential steps to develop novel robust PSO variants is provided.
2-This paper attempts to provide a thorough review of the applications of PSO to feature selection problems due to their extreme importance in the artificial intelligence field. Moreover, a comprehensive review of PSO-based feature selection is still lacking.
3-Eight potential research directions are identified to further enhance the optimization performance of PSO.
The rest of this paper is organized as follows. Section II illustrates the formulation of the PSO algorithm and other basic concepts related to PSO. It also highlights different neighborhood topologies used in PSO. In Section III, the modifications introduced to the original PSO by inertia weight and constriction factor concepts are discussed. In addition, it reviews several strategies that have been used to control the PSO parameters and it critically reviews several recent highperformance PSO variants. Section III also reviews historical prominent variants of PSO. Section IV presents the PSO in binary form and its variants. In Section V, the steps required for validating novel PSO variants are provided. Section VI focuses on the application of PSO to solve feature selection problems. Moreover, prominent engineering applications of PSO are overviewed in Section VI. Section VII demonstrates the drawbacks of PSO while Section VII provides some potential research directions that can help PSO researchers to enhance the performance of PSO further. Finally, Section IX concludes the overall remarks of this paper.

A. PARTICLE SWARM OPTIMIZATION
The first PSO was presented by Kennedy and Eberhart as a continuous real-valued algorithm [5]. This version is referred to as the standard PSO (SPSO) throughout this paper. In SPSO, a swarm of particles flies in a D-dimensional search space seeking an optimal solution. Each particle possesses a current velocity vector = [ 1 , 2 , … , ] and a current position vector = [ 1 , 2 , … , ], where is the number of dimensions. The SPSO process starts by randomly initializing and . Then, in each iteration, the best position that has been found by particle = [ 1 , 2 , … , ] and the best position that has been found by the whole swarm = [ 1 , 2 , … , ] guide particle to update its velocity and position by (1) and (2) where 1 and 2 are the cognitive and social acceleration coefficients, and 1 and 2 are two uniform random values generated within [0,1]interval. The pseudo-code of the SPSO for solving a minimization problem is shown in Algorithm 1.

Algorithm 1
The pseudo-code of the SPSO for solving a minimization problem [1] 1: Initialization 2: Define the swarm size and the number of dimensions 3: for each particle ∈ [1. . ] 4: Randomly generate and , and evaluate the fitness of denoting it as ( ) 5

B. VELOCITY CLAMPING
Velocity clamping was initially introduced by Eberhart and Kennedy [31] to avoid the velocity explosion and divergence. Velocity clamping limits the particles to move within a boundary in the search space by setting up a maximum velocity . If the updated velocity of a particle is found to exceed the maximum velocity , then it is set to as follows: Although the velocity clamping helps to prevent the velocity from explosion, finding a proper value of is very essential and it is not an easy task. A poor performance might occur if the is not selected properly. For large values of , the particles might fly in a very random manner and skip the optimal solution. On the contrary, for small values of , the particles would have a very narrow search space which might result in being trapped in a local optimum. To resolve this critical problem, the maximum velocity can be set as follows [32,33]: and are the maximum and minimum values of the search space boundary respectively, and ∈ (0,1].

C. POPULATION SIZE
Population size is defined as the number of particles in the swarm. It is a crucial parameter that characterizes the convergence performance of PSO. The main concern here is finding the optimal swarm size at which the best convergence performance of PSO can be attained. This concern has been addressed in [34,35] where the effect of the swarm size on PSO performance was investigated. The conclusion drawn in [34,36] states that a small number of particles does not support the swarm to explore more areas in the search space and produces poor solutions while a large number of particles improves the solution quality yet increases the computational complexity. Also, it is concluded that the optimal swarm size relies on the characteristics of the fitness function to be optimized. In the literature of PSO, it is common to set the population size to a size between 20 to 50 particles [37][38][39][40][41].

D. STOPPING CRITERIA
Typically, there are two types of stopping criteria that are used to terminate the PSO run. In the first stopping criterion, the execution of PSO stops when a predefined number of iterations is reached. This criterion has been widely used in the literature (e.g., [41], [40]). The second stopping criterion is the number of function evaluations (FEs) [37,[42][43][44], calculated as follows: where is the swarm size and is the maximum number of iterations.

E. CONTROLLING PARAMETERS OF PSO
In general, PSO has three main controlling parameters: inertia weight , the cognitive component 1 , and the social component 2 . These parameters have a remarkable effect on the PSO performance where the best performance can only be obtained by a proper setting of these parameters. In the literature, many research efforts have been carried out to enhance the performance of PSO by tuning these controlling parameters through different mechanisms. The following subsections focus on the state-of-the-art mechanisms for tuning these three parameters.

1) INERTIA WEIGHT
The existing inertia weight mechanisms can be classified into three groups. The first group includes mechanisms where the inertia weight is either static or random. This type of mechanism does not require any feedback or historical knowledge input. In the second group, the inertia weight changes with time. In other words, the inertia weight is a function of the iteration number. This mechanism is known as time-varying inertia weight. The third group is called adaptive inertia weight where the inertia weight keeps adjusting its value based on a feedback parameter. These three mechanisms are further elaborated as follows:

THE STATIC AND RANDOM INERTIA WEIGHT
As mentioned earlier, the inertia weight was introduced by Shi and Eberhart. In their work, a range of inertia weight values have been tested and the results showed that a better performance is obtained when is in the range [0. 8,1.2].
In [47], the inertia weight was presented as a random value. This method is suitable for applications in a dynamic environment since it is not easy to predict whether a large or a small value of is needed.
. Therefore, is limited to values in the range [0.5,1].

TIME VARYING INERTIA WEIGHT
In PSO, an extensive global search (exploration) is required at the early part of the process while the latter part requires focused local search (exploitation). A static inertia weight cannot meet such requirements. Thus, Shi and Eberhart [48] introduced the first time-varying inertia weight method called linearly-varying inertia weight (LVIW) to address this issue. The mathematical formula of this method is expressed as follows: where and are the initial and final values of the inertia weight, respectively, is the maximum number of iterations, and is the number of the current iteration.
In their experimental study, Shi and Eberhart [48] noticed that better performance is achieved if the PSO run starts by choosing an inertia weight value of 0.9 and linearly decreasing it until it reaches a value of 0.4 by the end of the PSO run. This setting indicates that a global search is performed at the beginning of the PSO run and it gradually decreases to refine the search to be locally focused. LVIW is one of the most common, if not the most common, time-varying techniques that have been widely used by many researchers. Besides this technique, a lot of time-varying inertia weight techniques have been proposed with different performance achievements. The formulae of such techniques are presented in Table 2.

ADAPTIVE INERTIA WEIGHT
In this group, the value of the inertia weight is adjusted based on at least one feedback parameter. Utilizing the concept of success rate [49], an adaptive inertia weight technique has been proposed in [50]. This adaptive technique considers the percentage of success as the feedback parameter. The inertia weight of this adaptive strategy is expressed as follows: where and are in the range [0,1] and ∈ [0,1] is the percentage of particles that succeeded to enhance their fitness in the previous iteration. Other adaptive inertia weight strategies are shown in Table II.

2) ACCELERATION COEFFICIENTS
The acceleration coefficients 1 and 2 guide the PSO search towards the optimal solution. In [5], it was pointed out that a relatively high value of 1 compared to 2 causes particles to extremely wander in the search space. Conversely, a relatively high value of 2 might cause the problem of premature convergence. The authors recommended to statically set the values of 1 and 2 to 2. Since then, a lot of authors followed this recommendation in their PSO studies. Although this setting appears to be the most common static strategy for 1 and 2 , other settings such as 1 = 2 = 1.49 are also common. In [40], a hierarchical PSO with a time-varying acceleration coefficient (HPSO-TVAC) is proposed. At the beginning of the HPSO-TVAC process, it is suggested to have a large value of 1 and a small value of 2 to let particles perform extensive search. On the contrary, a small value of 1 and a large value of 2 help particles to focus more on exploitation at the end of the searching process. The following mathematical expressions illustrates how the values of 1 and 2 are gradually varied: where the subscripts and denote the final and initial values, respectively. As suggested in [40], the values of 1 , 1 , 2 and 2 should be set to 0.5, 2.5, 2.5, 0.5, respectively.

E. NEIGHBORHOOD TOPOLOGIES IN PSO
Particles in a swarm are connected in a specific structure commonly known as a neighborhood topology within which they communicate with each other and share information. A study on how the neighborhood topology could influence the behavior of PSO operation was presented in [66]. Experimental results revealed that some neighborhood topologies perform better than others. The following subsections present various neighborhood topologies that have been used in PSO studies and applications.

1) STAR TOPOLOGY
The first PSO algorithm that was introduced in [5] was developed using a star topology where each particle considers all other particles as its neighbors. The star topology is also called Gbest in which all particles move towards the best global particle in the swarm. The velocity and position update equations for the star topology are the same equations in (1) and (2), respectively. The star topology achieves the fastest convergence among other topologies as it has a great exploration capability. However, it often suffers from convergence to local optima. The star topology has been widely used by many researchers in different applications due to its simple structure and fast convergence behavior.

2) RING TOPOLOGY
In the ring topology, each particle is connected to its two immediate neighbors forming a circle [66]. The ring topology is also known as in which a particle is attracted by its best local particle that has been found in its neighborhood. The velocity update equation for the ring topology is modified as follows: ( where ( ) is the best local position found in the ℎ particle neighborhood. The two neighbors of the ℎ particle are the ( − 1) ℎ particle and ( + 1) ℎ particle. Particles in the ring topology fly towards their local best position. This provides diversity and protects the algorithm from becoming stuck at local optima. However, the convergence speed of the ring topology decreases since more information needs to be exchanged. In addition, the ring structure is not as simple as the star structure. One of the earliest PSO variants that implemented the ring topology is the fully informed particle swarm (FIPS) [67]. In FIPS, the particle's velocity relies on all the particles' best positions of its neighbors. Moreover, FIPS applies the concept of the constriction factor.

3) VON NEUMANN
The Von Neumann topology is a rectangle matrix, for example, (3×4), resulting in a population of 12 particles where each particle is connected to the particles below, above, on its right and left sides, and wrapping the edges. The Von Neumann showed superior performance over other topologies in many test problems [68].

4) DYNAMIC TOPOLOGY
In the dynamic topology, the neighborhood is refreshed and regrouped after a certain number of iterations. In [69], a dynamic neighborhood is developed where each particle, in the early stage of the PSO run, exchanges information with only a small number of particles. This enhances the exploration process in the early stage of the run. As the number of iterations increases, the neighborhood of each particle increases as well. At the end of the PSO run, all particles communicate with each other resulting in a higher exploitation capability. A dynamic neighborhood strategy named dynamic neighborhood learning PSO (DNLPSO) is presented in [70].
DNLPSO improved the CLPSO algorithm [71] by making the learning particle's neighborhood dynamic.

5) OTHER TOPOLOGIES
In [72], a complex neighborhood PSO was proposed where the neighborhood structure is a complex network that can be tuned during the PSO run. The star topology and the ring topology were combined in [73] to form a single PSO named unified PSO (UPSO). Other common topologies such as the pyramid, wheel, and cluster topologies are presented in [32,[74][75][76].

III. PSO VARIANTS
Since the introduction of PSO, many new PSO variants have been proposed to enhance its optimization performance. Mainly, PSO is modified by developing new controlling parameters strategies, hybridizing PSO with other well-known meta-heuristic algorithms, cooperation and multi-swarm approaches. This section reviews recent and historical PSO variants and identifies their limitations.

1) INERTIA WEIGHT
To improve the convergence speed of SPSO, Yuhui and Eberhart [45] modified the SPSO velocity update equation ( + 1) by introducing a scaling factor that is multiplied by ( ) . This scaling factor is termed as Inertia weight and denoted by . Based on this modification, the velocity update equation in (1) becomes now in the following form:

2) CONSTRICTION FACTOR
Clerc and Kennedy modified the velocity update equation of the SPSO by introducing the concept of constriction factor [46]. The role of the constriction factor is to ensure that the PSO algorithm converges without using velocity clamping. By using the constriction factor, the velocity update equation becomes as follows: where = 2 /|2 − − √ 2 − 4 | , = 1 + 2 and ∈ (0,1]. The value of must be > 4 to ensure convergence. The value of controls the balance between exploration and exploitation. The exploration mode takes place when the value of is large whereas the exploitation mode will be activated when the value of is small. Eberhart and Shi stated that a combination of constriction factor and velocity clamping would speed up the convergence rate [28]. However, the constriction factor approach still faces the problem of becoming trapped in local optima.

B. RECENT PSO VARIANTS
In the last few years, many PSO variants have been proposed to overcome the limitations of the original PSO algorithm and the historical PSO variants. This part critically reviews PSO variants that are published recently. The authors in [77] proposed a new PSO variant named prey-predator PSO (PP-PSO) that implements catch, escape, and breeding strategies that can assist in enhancing the convergence speed and reduce the computational time. The proposed approach is tested on 10 classical benchmarking functions and the CEC2017 test suite for 10, 30, and 100 dimensions. Although this approach has shown good performance, this good performance comes at the expense of an unreasonable number of function evaluations that can reach up to 10 6 function evaluations. Moreover, the proposed variant was not tested on real-world engineering problems. In [78], a multi-swarm PSO is proposed where a sub-swarm focuses on exploration while a different subswarm is performs exploitation. The performance of the proposed variant is tested on the CEC 2015 on 10 and 30 dimensions. The performance of this variant on highdimensional problems as well as real-world engineering problems is not investigated. The work in [79] proposed a competition-based PSO variant where each particle is allocated a competition coefficient that allows to distinguish particles and divide them into three groups. The proposed method is tested on the CEC 2013 benchmarking functions for 10 and 30 dimensions and retarder designing problem as a real-world engineering problem. The impact of increasing the dimensions on the performance of the competition-based PSO needs to be investigated. The authors in [80] developed a new PSO variant that utilizes PSO with two differential mutations. The proposed approach was tested on 16 well-known benchmarking functions and CEC 2013 on 30 dimensions only.
In [81], a novel PSO variant is proposed where the main contribution is the utilization of the sigmoid function to update the PSO acceleration coefficients. The effectiveness of the proposed variant is evaluated by testing it only on 8 classical benchmarking functions on 30 dimensions. Further work is needed to evaluate the performance of this variant when it solves constrained optimization problems and real-world engineering problems. An improved social learning PSO is developed in [82] where the three best particles are updated using a differential mutation strategy. The developed approach is tested on the CEC 2013 test suite on 30 and 50 dimensions. Tough the proposed variant has shown good performance, it was compared with PSO variants only. Moreover, its performance on real-world optimization problems is not studied. The authors in [83] developed a novel PSO variant for constrained optimization problems. The proposed approach was tested on twenty four classical benchmarking functions for low dimensional problems as well as on the reservoir drainage plan optimization problem. With the help of mixed mutation strategies, a new PSO variant is proposed in [84] based on the idea of dividing the total population into an elitist population and a general population. The effectiveness of the proposed algorithm is evaluated by testing its performance on sixteen well-known benchmarking functions for 30, 50, and 100 dimensions. This multipopulation PSO variant requires a massive number of function evaluations to achieve good performance. In addition, its performance was not validated on real-world constrained optimization problems. Also, The effectiveness of this variant was compared with PSO variants only.
The original PSO velocity update equations are modified in [85] by adding two new terms that aim to enhance the performance. The new PSO variants are tested on sixteen classical benchmarking functions for 50 dimensions without evaluating their effectiveness on real-world optimization problems. In addition, its performance is compared with PSO variants only. A novel PSO variant that is tested on CEC2013 for 30 dimensions is proposed in [86]. The main concept of the proposed approach is to split the whole population into several sub-swarms using a chaotic sequence. To achieve good performance, this variant requires a massive number of function evaluations which is computationally expensive. Utilizing complex-order derivatives, an improved version of PSO, that is tested on CEC 2017 for 20, 30, and 40, is proposed in [87]. The improved PSO is only compared with PSO variants without considering other well-known meta-heuristic algorithms. Based on forgetting ability and multi-exemplar, a new version of PSO is proposed in [88] where its effectiveness is tested on CEC 2013 for 30 dimensions. Although the proposed approach shows good performance in terms of average fitness and standard deviation for most of the tested functions, massive function evaluations are needed to achieve such performance. In [89], inertia weight PSO [90], CLPSO [91], LIPS [92], HPSO-TVAC [93], and FDR-PSO [94] algorithms are combined to produce a new single variant. The performance of the new variant is evaluated on CEC2005 for 10 and 30 dimensions, and it is compared with PSO variants only.
PSO has recently been hybridized with several metaheuristic algorithms such as the whale optimization algorithm. PSO is hybridized with the whale optimization algorithm in [95] and its performance is evaluated on 18 classical benchmarking functions as well as on electronic design optimization problems. Although the proposed approach shows good performance, the results are based on only 20 independent runs which might be not enough to produce accurate results. In [96], a hybrid PSO algorithm is developed utilizing an adaptive learning strategy. The effectiveness of the hybrid approach is tested on 12 classical benchmarking test functions and CEC 2013 for only 30 dimensions. Moreover, its performance is compared with PSO variants only. By hybridizing PSO with sine cosine acceleration coefficients, a novel hybrid algorithm is introduced in [97]. The performance of the hybrid algorithm is evaluated on 12 well-known benchmarking functions for 10, 30, and 50 dimensions. However, its performance on constrained optimization problems is not investigated.
In summary, recent PSO variants have shown good optimization performance. However, all the PSO variants presented in this subsection except [79] [83] [95] did not consider constrained real-world optimization problems. Their performance on real-world constrained optimization problems needs to be investigated. The performance of [78][79][80][81] Table III summarizes the recent PSO discussed in this subsection and presents their ideas and limitations.

C. HISTORICAL PSO VARIANTS IN CONTINUOUS SEARCH SPACE
Since the introduction of SPSO in 1995, there has been a continuous research effort in enhancing the convergence speed, quality of achievable solutions, and stability of PSO. This has resulted in an enormous number of PSO variants some of which are dedicated to solving optimization problems in specific applications while the rest are used for general numerical optimization. This subsection discusses in detail the most important historical PSO variants that have been developed since the advent of PSO.

1) COOPERATIVE PSO
Cooperation, in context to meta-heuristics, is defined as exchanging information between a number of agents to perform a specific task [98]. Though individual human beings can work separately and compete with each other to enhance their performance, better enhancement can be achieved by cooperation. Potter and De Jong [99] applied the cooperation concept in genetic algorithms (GAs). In [100], the same idea was extended to PSO, and a new PSO variant named cooperative particle swarm optimization (CPSO) was introduced. In SPSO, each particle consists of a D-dimensional vector that represents a candidate solution. The updates of position and velocity equations that occur in each iteration treat this Ddimensional vector as one entity. Hence, there might be some components that are selected to represent the solution though they are moving far from this solution. These components are wrongly selected since SPSO considers the overall enhancement of the entire vector. Thus, CPSO [100] was introduced to tackle this problem.
CPSO proposed two models denoted as CPSO-Sk and CPSO-Hk. In CPSO-Sk, the entire vector is split into swarms and each swarm has a 1-D vector. Every single component of the entire vector is optimized by the swarm that it belongs to. In this case, the evaluation of the optimization function is infeasible since the evaluation requires knowledge of the entire D-dimensional vector. To handle this, a context vector is invoked to form a vector that acts as a suitable input for the optimization function. The context vector can be formed by taking the values of the particles from each of the swarms and concatenating them to build up the input vector. To evaluate the fitness for the entire particles in the ℎ swarm, the ℎ component takes the value of the first particle of the ℎ swarm while the rest of the context vector components are kept constant at the values. The same procedure occurs for the rest of the particles in the ℎ swarm. Experimentally, CPSO-Sk has been found to be easily stuck in sub-optimal regions of the search space. Thus, CPSO-Hk, which is a combination of CPSO-Sk and SPSO, is used to overcome this problem.
In [101], a new CPSO variant with the concept of dimension partition and adaptive velocity control was proposed. With this approach, the new variant was dedicated to optimizing multimodal functions by using the two-swarm cooperative technique while using adaptive velocity control. In this work, the population is split into two swarms where the SPSO is applied to the first swarm to perform a full dimensional search and a single-dimensional PSO is applied to the second swarm to perform a 1-D search. Information is shared between the two swarms in a communication phase. Unlike the conventional CPSO, the two swarms in this new CPSO variant work concurrently. As for the adaptive velocity control, is changed dynamically based on how each particle flies in the search space. This new CPSO variant showed better performance when compared with other variants for most of the tested problems.
The work presented in [102] used the CPSO and inertia weight adaption together to come up with a new PSO variant named adaptive cooperative PSO (ACPSO). This method implemented the CPSO that was presented in [100] and provided an adaptive method that automatically controls the inertia weight. ACPSO was tested only on three benchmarking functions and the results showed that its solution quality and convergence behavior are better than CPSO for all the three tested functions. However, the performance of ACPSO still needs to be thoroughly investigated using other benchmarking functions to prove its effectiveness.

2) MULTI-SWARM PSO
The concept of multi-swarm PSO (MSPSO) has been applied in several PSO research works. In MSPSO, the population of particles is split into sub-swarms where each sub-swarm carries out a specific task. A sub-swarm task might be adjusted as time goes on and information is shared among sub-swarms.
One of the works that considered the use of the MSPSO concept was presented in [103]. This work presented a multiswarm cooperative PSO (MCPSO) that divides the population into one master swarm and multiple slave swarms. Each slave swarm performs an independent single PSO run to control the diversity of the population whereas the formation of the master swarm depends on its own experience as well as the slave swarm experience. In MCPSO, the master swarm can update its particles by either a sequence of competitions or a sequence of collaboration with the slave swarms. The first case is known as the competitive MCPSO while the second is called the collaborative MCPSO. The performance of MCPSO was evaluated on six benchmarking functions and results have demonstrated that it can perform better than the SPSO [103,104].
A Multi-swarm Self-adaptive CPSO (MSCPSO) was proposed in [105]. The total population in MSCPSO is split into four sub-swarms where information is shared among themselves. MSCPSO applied three strategies namely cooperative, diversity, and self-adaptive strategies to escape from becoming stuck in local optima, enhance diversity, and obtain better solutions. An attractive feature of this algorithm is that it does not add any complexity to the SPSO algorithm. In other words, its implementation is as simple and easy as the SPSO. MSCPSO was examined only on six benchmarking functions for 10 and 30 dimensions. Although MSCPSO has shown good performance on the six tested benchmarking functions in the cases of 10 and 30 dimensions, there is no proof that this algorithm can show good performance in the case of high-dimension search space or when other benchmarking functions are tested.
A tribal PSO (TPSO) is proposed in [106] where the population is split into several tribes or sub-swarms using a self-clustering algorithm. The process of the TPSO algorithm consists of four major steps: initializing population, using a clustering algorithm to generate tribes, performing the evaluation step where the performance of each particle is evaluated, and finally using the tribe's adaptation method to add and delete particles.

3) HYBRID PSO
In the field of meta-heuristics, hybridization is the process of selecting the best properties of two distinct algorithms that can solve the same problem and joining them together to come up with a novel algorithm that can achieve better results than the individual algorithms. PSO has been hybridized with many evolutionary algorithms such as GA, DE, and ACO to overcome its drawbacks, such as premature convergence. The hybridization of PSO with GA, DE, ACO as well as with other techniques is presented in the following.

HYBRIDIZATION OF PSO WITH GA
GA was initially introduced by John Holland [107] as one of the earliest evolutionary algorithms. Combining PSO with GA is a famous approach that has been widely considered due to the superior convergence performance as compared to the individual PSO and GA.
A hybrid PSO and GA (GA-PSO) was proposed in [108] to solve multimodal problems. The process of GA-PSO starts by creating a population size of 4 for a problem with dimensions. The fitness of each individual is calculated, and individuals are ranked based on their fitness values. The selection, crossover, and mutation operators of GA are applied to the best 2 individuals whereas PSO is applied to the worst 2 individuals. This hybrid approached is tested on seventeen multimodal functions and it has shown better performance in terms of solution quality and convergence speed when compared with the continuous genetic algorithm (CGA) [109] and Nelder-Mead PSO (NMPSO) [110]. In [111], two-hybrid algorithms named GA-PSO and PSO-GA were introduced. In GA-PSO, the PSO initial population is created by GA, whereas in PSO-GA, the GA initial population is created by PSO. It has been observed that the PSO-GA performs better than GA-PSO, SPSO, and GA.
The work in [112] combined PSO with GA for field development optimization. The resultant hybrid algorithm is called genetical swarm optimization (GSO). In this hybrid algorithm, the population is split into two portions and it is reconstructed by GA and PSO operations in every iteration. A hybridization constant (HC) was introduced to indicate the population percentage that is constructed with GA where = 0 indicates that only PSO is used and = 1 indicates that only GA is implemented.
In [113], a hybrid PSO and GA named HPSOGA was proposed. In this approach, the population is split into two groups based on a hybrid probability . The size of the first group is × where is the number of particles in the whole population, and the size of the other group is − ( × ). The first group updates its particles positions by PSO while particles in the other group are updated by the three GA operations: selection, crossover and mutation. HPSOGA showed that its performance is better than the performance of SPSO.

HYBRIDIZATION OF PSO WITH DE
DE is a population-based algorithm that was first presented by R. Storn and K. Price [114] in 1995 to solve optimization problems. The selection, mutation, and crossover operators of GA are also used in DE but they function differently. One of the advantages of DE is that it maintains diversity; however, unlike PSO, it is unable to keep track of the process history [115]. In [115], a hybrid DE with PSO (DEPSO) algorithm was proposed to solve economic dispatch problems. The overall procedure of this proposed algorithm is based on DE and letting PSO generates a second mutant operator. DEPSO showed its effectiveness in producing good solutions and efficient computation. DE and enhanced PSO (EPSO) were hybridized in [116] where they are executed in parallel and information is exchanged frequently. This approach was applied to design antenna arrays. DEPSO achieves a better global search than the individual DE and EPSO. In [117], a hybrid approach that combines PSO and DE is developed. In this approach, each of the PSO iterations is followed by implementing the three operators of DE (mutation, recombination, and selection) to the best personal positions. During the mutation procedure, six DE mutation techniques can be used. After that, a tournament is conducted to select the best position.
A hybrid PSO and DE (PSO-DE) was proposed in [118] to find the optimal design of water distribution systems. The basic idea behind this approach is that DE is not integrated with PSO at all iterations but only at a predefined interval of iterations. The results of PSO-DE, in solving three water distribution problems, showed better solution accuracy and computation efficiency than PSO. To confirm the effectiveness of PSO-DE, it should be used to solve more complex optimization problems and be compared with DE. In [119], the authors proposed a hybrid PSO and DE (DE-PSO) that is divided into two alternating phases, DE phase, and PSO phase. This hybrid version begins with the DE phase until a trail vector is created. The trail vector is added to the population if it satisfies a predefined requirement, else the proposed algorithm switches to the PSO phase and creates a new potential solution. DE-PSO is evaluated on several numerical benchmarking problems and the results have shown that DE-PSO outperforms the standard PSO and DE. In [120], a hybrid quantum PSO (QPSO) [121] with DE named DEQPSO is presented to solve a route planning problem. The first step in DEQPSO is to update the population by PSO then activate the DE algorithm. This proposed algorithm introduced a new form of vectors called the donor vector which makes the DE in this algorithm somewhat different from the classical DE. Based on simulation results, DEQPSO outperforms QPSO and DE in terms of optimal solution and convergence speed.
In [122], a hybrid algorithm based on PSO and ACO was proposed. The developed hybrid algorithm is named as hybrid ant particle optimization algorithm (HAP). In each HAP iteration, separate executions of PSO and ACO are performed resulting in a new solution for PSO and another new solution for ACO. The best solution out of these two solutions is chosen to be the global best of the overall system. Particles and ant positions are updated based on the parameters of this obtained global best. HAP has shown that it can achieve better solutions as compared with SPSO and ACO. However, HAP was tested only on simple and low-dimensional benchmarking functions. Its performance in complex high-dimensional optimization problems needs to be investigated. A new hybrid method consisting of PSO and ACO that is used for energy optimization was proposed in [127]. The concept of this approach is to update the direction operator of movement if the best solution of ACO is affected by the best solution of PSO. This hybrid approach was also used in [128] to tune the controller coefficients in wind power plants. In [129], a hybrid PSO with ACO is proposed for the economic dispatch of a power system.
A novel hybrid PSO and GSA (HPSO-GSA) that is tested on only five benchmark functions is proposed in [130]. Results have shown that HPSO-GSA performs better than the individual performance of PSO and GSA for all the selected five benchmarking functions. Another combination of PSO and GSA is the gravitational particle swarm (GPS) [131]. In GPS, the velocities and positions of particles are updated based on the velocity of PSO as well as the acceleration of GSA. The results have demonstrated that GPS outperforms SPSO and GSA. However, the parameter setting in GPS is not optimal. Thus, further work is needed to produce better results through efficient parameter tuning. An improved hybrid version of PSO and GSA called centripetal accelerated PSO (CAPSO) was introduced in [41]. In CAPSO, the standard velocity of PSO shown in (1) is modified by adding two terms called acceleration and centripetal acceleration. This modification is introduced to accelerate the convergence speed and protect the algorithm from becoming trapped in local optima.
A hybrid algorithm that consists of PSO and Legendre pseudo-spectral method (LPM), namely PSO-LPM was proposed in [132]. PSO-LPM was used to solve planning problems. PSO-LPM starts the search process with the PSO algorithm only and it switches to the LPM algorithm if it finds that the change in the fitness function has become smaller than a predefined value. This hybrid approach provides better convergence speed and global search than both the separate PSO and LPM. In addition, its performance is not affected by random initialization. In [133], PSO was combined with the levy flight distribution method resulting in a new PSO variant called levy flight PSO (LFPSO). LFPSO alters the SPSO by adding two new ideas. The first idea is giving each particle a limit value. In each iteration of LFPSO, if a particle does not provide better solutions, the limit value is increased by 1. The second idea is using the Levy distribution method to reallocate the positions of particles that have exceeded the limit value. These two ideas aim to enhance the global search capability and avoid premature convergence to local optima. It has been demonstrated that LFPSO outperforms other PSO variants including CLPSO and HPSO-TVAC as well as other optimization methods like GA and DE.

OTHER HISTORICAL PSO VARIANTS
In [134], a modified PSO with time-varying acceleration coefficients (MPSO-TVAC) was presented. This algorithm proposed a new parameter termed ' ' which provides additional information to each particle. As a result, better exploration is achieved leading to premature convergence avoidance. In this method, each particle chooses any random particle from of all particles, other than its own , and it considers it as its own . The velocity update equation of this algorithm is given in the following form: where 3 is an acceleration constant which attracts each particle to move in the direction of , and 3 is a uniform random value in the range [0,1]. In MPSO-TVAC, the acceleration coefficients 1 and 2 are varied with time and their formula is provided in [40] whereas the formula of 3 is expressed as follows: where is the current iteration number. In [135], a novel Gaussian PSO named Gaussiandistributed PSO (GDPSO) was presented. In GDPSO, the position of a particle is updated based on Gaussian distribution. This method does not require parameter tuning and its performance in solving high-dimension complex functions is superior to Gaussian PSO (GPSO) [136].
Based on the grey relational analysis, Leu and Yeh [137] proposed a PSO variant termed grey PSO. In each iteration of grey PSO, each particle is assigned a unique inertia weight, a cognitive component, and a social component. This algorithm achieves faster convergence speed and better solution accuracy as compared with PSO-LVIW [45], HPSO-TVAC [40], and APSO [65].
The work in [138] proposed an enhanced PSO incorporating a weighted particle (EPSOWP). EPSOWP calculates a weighted particle that guides the particles of a swarm towards the optimal solution. Based on simulation results, EPSOWP outperforms the SPSO, GA, and DE algorithms on some selected benchmarking functions. In [139], a team-oriented swarm optimization (TOSO) is proposed where the swarm is divided into two teams. The role of the first team is to perform exploration while the second team performs exploitation. The two teams interact with each other by sharing information about Gbest. This PSO variant omits the need for the inertia weight, cognitive coefficient, and social coefficient. Instead, it relies on only one parameter known as mutation probability (pm). Unlike most of the PSO variants, this variant was tested in very high dimensions (up to 1000 dimension) cases. Although TOSO has shown good performance for various benchmarking functions, it still has drawbacks it terms of its exploration capability.
To avoid the problem of the premature convergence of SPSO while maintaining fast convergence, PSO with aging leader and challengers (ALC-PSO) was presented in [37]. In ALC-PSO, the swarm's leader possesses a lifespan that can be adjusted by the leader's leading power (stronger leading power indicates longer life for the leader) and its age increases with time. The other particles of the swarms (challengers) have the chance to claim the leadership once the leader has become old. The leader attracts other particles if its leading power is high; otherwise, new particles are allowed to compete to take the leadership. A median-oriented PSO (MPSO) was introduced in [140] to avoid becoming trapped in local optima and to accelerate the convergence speed. In this approach, each particle updates its velocity based on the current velocity and a median-oriented acceleration. This variant omits the need for the inertia weight , cognitive coefficient 1 1 c , and social coefficient 2 2 c . Another PSO variant called orthogonal learning PSO (OLPSO) was proposed in [42]. This PSO variant uses an orthogonal learning strategy for PSO to achieve faster convergence speed and better solution quality. The role of the orthogonal learning strategy is to let the particles move in better directions. The results of OLPSO demonstrated its superiority in terms of convergence speed and solution quality as compared to the SPSO and some other PSO variants. [141]. The BPSO is applied to solve binary problems where each dimension of a particle can have two states only: 0 or 1. The values of 1 and 0 can have different meanings such as true or false, yes or no, selected or not selected, respectively. The updated velocity ( + 1) in BPSO is the same as the updated velocity in the continuous PSO but with restricting the values of , , and to binary values. The result of ( + 1) is real continuous values, though. The ( + 1) can be limited to have values in the range of [0,1] based on a transfer function. One of the most common transfer functions is the sigmoidal function which is given as follows:

Kennedy and Eberhart introduced the binary version of PSO (BPSO) in
Similar to PSO in continuous search space, a particle updates its velocity in BPSO using Equation (1). In BPSO, a particle updates its position based on a probabilistic equation given by: where 4 is a uniformly distributed random value in the interval [0,1]. From Equation (17), it is observed that ( ( + 1)) becomes 0 when the value of ( + 1) is less than -10. At this state, the updated position ( + 1) will remain 0 and no bit flip will occur. Similarly, ( ( + 1)) becomes 1 when the value of ( + 1) is greater than 10 and the updated position ( + 1) will remain 1. In [141], it is recommended to limit the velocity to ±6 where there will be a probability of 0.0025 for bits to be flipped. The work presented in [142] recommended tighter values to limit the velocity (±4) .
Unlike the continuous PSO, only limited research efforts have attempted to modify the standard BPSO to enhance its performance. As discussed earlier, the velocity in BPSO should be limited to ±6 or ±4. In [143], an essential binary particle swarm optimization (EPSO) is proposed based on the idea of omitting the velocity component of PSO. Thus, there is no need to limit the velocity. The EPSO adopted the concept of queen informants in ACO and applied it in PSO resulting in a modified form of EPSO denoted as EPSOq. In EPSOq, a new informer named the queen informer is added where it is updated after each loop by only and its role is to provide information to other particles. The EPSO and EPSOq were applied to solve two suites of test functions and EPSOq showed better performance in terms of convergence rate and solution quality as compared to the standard BPSO and EPSO. However, the results of EPSOq are not optimal.
To overcome the problem of nonlinearity that results from the sigmoid function and the problem of the unusual behavior of the probability function of a bit-change, an improved binary particle swarm optimization (IBPSO) is proposed in [144]. In IBPSO, the XOR and AND operators are used in the velocity update equation ( + 1), and the updated new position ( ) depends on the current position. Utilizing the genotypephenotype concept, a modified binary particle swarm optimization is introduced in [145]. In this approach, the standard BPSO is modified by letting the velocity and the position act as a particle and a solution. The position in the velocity update equation is a phenotype and the updated position equation is a genotype that depends on the current phenotype's position. This modified binary version is evaluated on ten benchmarking functions and the results have demonstrated that its performance is better than the standard BPSO. A novel binary particle swarm optimization ( PBPSO) was proposed in [146] to address the problem of the long time spent by the sigmoid function. In [147], an adaptive mutation operator was added to the PBPSO resulting in a new binary variant called adaptive mutation PBPSO (AMPBPSO). In AMPBPSO, the new binary position update is based on an adaptive mutation probability which is evaluated by measuring the distance between the new binary position and its best position. The introduced adaptive mutation operator helps to maintain diversity and enhance local search.
The BPSO finds some difficulties to converge to the best solution because the binary positions are based on randomness. In addition, BPSO suffers from becoming trapped in local minima [141] [148]. A V-shaped transfer function is used in [149] instead of using the S-shaped transfer function to avoid unhealthy randomness. In [149], the new binary position depends on the V-shaped transfer function and it has three transition states: stays in its current position, changes its value to 1, or changes its value to 0. In this case, the randomness of binary positions is reduced. Though this method is capable of reducing the randomness of binary positions, it is incapable to solve the problem of convergence to local minima. To avoid this later problem, an enhancement to the work done in [149] was proposed in [150] by adding a mutation operator. The overall framework in [150] consists of a V-shaped transfer function, a new updating position formula, and a mutation operator. This combination enhances the convergence rate and diversity of particles and it also helps to escape from the local minima problem.
In [151], six new S-shaped and V-shaped transfer functions were introduced and tested on twenty-five benchmark functions. Based on the results, the V-shaped family of transfer functions outperforms the S-shaped family in terms of the convergence rate and escaping from local minima. Therefore, it is recommended to use the V-shape family, particularly the V4 transfer function, to enhance the standard BPSO performance.
In [152], the velocity update equation is modified to have three different equations for the three different cases: when = = 1, = = 0, and ≠ . The velocity increases if = = 1, decreases if = = 0, and remains unchanged when ≠ . This is justifiable by the consensus among and in the first two cases whereas the third case lacks this consensus. The proposed algorithm showed superior performance compared with other BPSO variants. A hierarchical BPSO (BPSOHS) inspired by multilevel learning behavior was proposed in [153]. Particles in the proposed approach are split into two groups: leaders and followers. In BPSOHS, the leaders' velocity and position updates are the same as the standard BPSO while the followers' velocities and positions are updated based on a random walk probability and a decision from the leaders. The idea of this method is to enable followers to fly towards leaders and at the same time to explore an extensive region near the leader space. Moreover, a mutation technique is implemented in order to avoid premature convergence.
Utilizing the sigmoid transfer function, the author in [154] proposed a new binary PSO version where the PSO acceleration coefficients are modified based on the fitness of each particle. The effectiveness of the proposed approach was tested on four problems in the continuous search space and its performance in optimizing binary problems is not validated. In addition, the number of independent runs is only 10 which is not enough to produce accurate results. At least 30 independent runs are needed to validate the performance of the proposed binary variant. Recently, the work in [155] converted the gaining-sharing knowledge-based continuous algorithm [156] into a novel binary PSO variant where both algorithms are based on the idea of gaining-sharing knowledge that humans experience during their lifespan. The new binary variant is tested on twenty two feature selection benchmark datasets and its performance is compared with the standard binary PSO and other well-known binary optimization algorithms such as binary GWO and binary salp swarm algorithm. The proposed approach was only tested on feature selection problems while its performance of multidimensional knapsack problems is not investigated. Thus, it would be interesting to study the performance of this binary variant when it solves multi-dimensional knapsack problems.
The work in [157] hybridized the binary PSO with the sine cosine algorithm to solve feature selection problems. A Vshaped transfer function is used and the performance of the hybrid variant is compared with some well-known binary PSO algorithms including the standard one as well as with other high-performance binary algorithms such as binary whale optimization algorithm and binary moth flame optimization algorithm. Although the hybrid approach has shown good performance, all results are obtained for 10 independent runs only which is not enough to achieve high accuracy. The authors in [158] developed a hybrid approach that combines binary PSO with tabu search to solve the set-union knapsack problem. The performance of this hybrid approach on feature selection problems is not investigated yet. Although [157] and [158] have achieved remarkable performance, this achievement comes at the expense of complexity.
A new binary PSO variant that is designed to solve feature selection problems is proposed in [159]. The idea of the proposed algorithm is to divide the entire population into subswarms where each sub-swarm implements a unique inertia weight strategy. Although the proposed approach has shown better classification performance compared with the binary PSO, GA, and binary GSA, it requires more computational time than the standard binary PSO. In [160], a time-varying mirrored transfer function is proposed and its performance is evaluated on CEC 2005 benchmark functions as well as on 0-1 multidimensional knapsack problems. Results have shown that the proposed transfer function outperforms the S-Shaped and V-shaped transfer functions. The performance of this new transfer function when used by other-metaheuristic algorithms is not studied yet. Thus, more research work is needed to further validate the effectiveness of this mirrored transfer function. In addition, its performance on feature selection problems needs to be investigated.

A. BPSO TRANSFER FUNCTIONS
The role of a transfer function is to map the velocity of a certain dimension of a particle into the probability of bit flipping. According to [161], three rules must be followed when selecting a transfer function:  The probability of changing a bit from 0 to 1 or vice versa must be high for large absolute values of velocities.
 The probability of unchanging a bit must be high for small absolute values of velocities.  The outcome of a transfer function should be in the range of [0,1] as it acts as a probabilistic function. Some transfer functions have been proposed in the literature such as the sigmoid function, the S-shaped family, and the Vshaped family [151]. Table IV lists the most common transfer functions that can efficiently convert a continuous search space into a binary one. The performance of binary algorithms is highly dependent on the selection of the transfer function. Thus, it is crucial to investigate the performance of new binary variants when different transfer functions are used to figure out which transfer function is the most suitable for each variant. Some of the transfer functions listed in Table IV were originally proposed for binary meta-heuristic algorithms and not binary PSO. However, they can be implemented in binary PSO and their performance on binary PSO is to be studied.

V. VALIDATION OF NEW PSO VARIANTS
This part focuses on the steps that are required to validate the effectiveness of new PSO variants. These steps can be summarized as follows: 1. Development of a novel approach based on new ideas, parameter modifications, or hybridizations The first step when developing a novel PSO variant is introducing new ideas particularly concepts that can help to balance exploration and exploitation. The most common concepts that help improve the performance of PSO are modification of controlling parameters particularly the inertia weight, hybridizing PSO with other prominent meta-heuristic algorithms, and multi-swarm approaches.

Testing the novel PSO variant on a wide range of benchmarking functions
The next step is to validate the performance of the new PSO variant to solve several unimodal, multimodal, and composite benchmarking functions. The most common classical benchmarking functions consist of twenty three unimodal and multimodal functions that are widely used by researchers [124,151,166,167]. Although these functions can validate the exploration and exploitation abilities of a certain PSO variant, these functions do not fully represent real-world optimization problems since they are unconstrained problems. To represent real-world problems that contain a number of constraints, the CEC2017 test suite is introduced. Therefore, a strong PSO variant should be able to provide significant improvements when dealing with the CEC2017. Other widely used benchmarking functions suites are CEC2005 and CEC2019.

Testing the new PSO variant on real-world engineering problems
This is a crucial step to demonstrate the effectiveness of a proposed PSO variant. Real-world optimization problems are challenging since they have a number of constraints that must be satisfied. The introduction of constraints divides particles into valid and invalid particles. A valid particle is a one that can meet all constraints whereas a particle is considered invalid if it violates one or more constraints. One of the most common ways to penalize a particle when it does not satisfy all constraints is to assign its fitness a large value such as 10 12 when solving a minimization problem. The most widely used engineering problems that serve as benchmarks to test the performance of a new optimization algorithm are welded beam design, speed reducer design, pressure vessel design, and tension/compression spring design.

Comparison with well-known PSO variants and other meta-heuristic approaches
The fourth step is to compare the performance of the developed PSO variant with other prominent PSO variants. However, this is not enough as the performance of the new PSO variant must be compared with other outstanding metaheuristic algorithms since their performance might be better on a certain set of functions compared with existing PSO variants.
5. High dimensional performance A PSO variant might show strong performance when it deals with low dimension problems; nevertheless, it may have a poor performance when it solves high dimensional problems. As a consequence, it is crucial to validate the effectiveness of the new variant when it solves both low and high-dimensional problems. The performance of a PSO variant usually degrades as the number of dimensions increases; therefore, it is essential to investigate the performance of a new PSO variant on high-dimensional problems.
6. Sensitivity analysis PSO controlling parameters have a direct influence on optimization performance. Some new PSO variants may add new parameters besides the three controlling parameters of the original PSO. Thus, it is crucial to provide a sensitivity analysis that illustrates the influence of these parameters on the performance of the new variant. In addition, it is important to show which parameters are sensitive to different settings and also show which parameters are robust.
7. Convergence analysis Although the average fitness and standard deviation are two important metrics that help to validate the effectiveness of an optimization algorithm, a convergence analysis is required to further demonstrate the ability of an optimization algorithm to escape from local optima and converge to a global one.
8. Statistical significance analysis Statistical significance analysis is an essential step that needs to be performed to show that a new PSO variant is statistically more significant than other existing PSO variants or meta-heuristic algorithms. In the literature, there have been a significant number of non-parametric statistical tests that help to demonstrate the superiority of one algorithm over others. Wilcoxon rank-sum test and Friedman test are the most two common statistical tests that are used to evaluate the performance of meta-heuristic algorithms.

VI. APPLICATIONS OF PSO
Due to its simplicity and robustness, PSO has been widely used as an efficient optimization tool for solving various optimization problems in many real-world applications such as feature selection, wireless communications, image processing and electrical power systems. The following present the applications of PSO in the aforementioned fields.

A. APPLICATIONS OF PSO TO FEATURE SELECTION
This part focuses on the applications of PSO on feature selection problems. It starts with an introduction to feature selection followed by a detailed explanation of how PSO is applied to solve feature selection problems. Finally, PSObased feature selection studies are reviewed.

1) FEATURE SELECTION
Feature selection is a selection process that aims to select features from original features ( < ) to optimize a certain metric [168], [169], [170]. Feature selection is a crucial process in machine learning and data mining as it can significantly help to remove unnecessary and redundant features [171]. For a large number of features, finding the optimal number of features is a complicated problem [172]. Generally, the selection of features is used for four reasons: simplifying data, reducing computational time, avoiding the dimensionality curse, and reducing overfitting. Figure 1 illustrates the feature selection process which goes through five steps: initialization, generation, evaluation, stopping criteria, and validation. In the initialization step, the number of all original features represents the dimensionality of the search space. The second step is responsible to select the best subset of features. Various searching approaches such as conventional schemes and meta-heuristic algorithms can be utilized to perform this task. Typically, searching can start with no features, all features, or a random selection of a subset of features [173], [174], [175]. Selected subsets in the second step is evaluated in step three to check their goodness. The fourth step requires good stopping criteria that terminate when good performance is achieved. The final step validates the effectiveness of the obtained subset of features on a test set.  Figure 2 shows the key factors of feature selection which include searching algorithm, number of objectives, and evaluations measures. The first key factor of feature selection is the searching algorithm that attempts to find the best subsets of features. Feature selection is an NP-hard problem particularly for large datasets as it has 2 possible solutions where denotes the number of original features. Thus, searching algorithms play an important role in solving feature selection problems since they can achieve remarkable performance with a significant reduction in computational time. The number of objectives represents the second key factor where a single objective such as minimizing the classification error rate is considered or multiple objectives such as minimizing the number of features and minimizing the classification error rate are taken into account. Evaluation measures as the third key factor use an evaluation function that can determine the strength and the drawbacks of the selected subset which in turn help to guide the searching algorithm.

FIGURE 2. The three key factors of feature selection
Feature selection approaches can be classified into two main categories: filter and wrapper methods [176], [177], [169]. The main difference between the two is that wrapper approaches implement a classification algorithm to evaluate the goodness of the selected features whereas filter methods do not. As a consequence, wrapper approaches achieve better performance [174], [176], [178], [179]. Some research work [174], [178], [179] adds the embedded approach as a third category of feature selection approaches. In the embedded approach, the classifier and the selected features are integrated. Table V shows the strengths and drawbacks of the filter, wrapper, and embedded approaches.

2) PSO FEATURE SELECTION MECHANISM
Feature selection is a binary optimization problem by nature.
To represent a solution that has the potential to solve the feature selection problem, a vector with features dimension is needed where each element of a vector can have a value of either 0 or 1. A value of 0 indicates that a feature is not selected while 1 indicates the selection of the feature [3]. PSO can form a binary vector that can be used to solve the feature selection problem. Optimization problems where their variables are continuous values can be turned into a binary optimization problem by replacing the continuous variables with binary variables. PSO in its continuous version allows candidate solutions to update their positions where each variable can have a continuous value. In binary optimization, a position is updated by converting its value from 0 to 1 or from 1 to 0. As a result, for PSO to be able to solve feature selection problems, a transfer function is applied to convert the real positions of candidate solutions into binary ones [8]. Transfer functions rely on a probabilistic approach to update binary values from 1 to 0 or from 0 to 1. Several transfer functions have been proposed in the literature where the S-Shaped and V-Shaped transfer functions are the most common ones [21].

3) FEATURE SELECTION STUDIES BASED ON PSO
PSO has gained significant consideration in the domain of feature selection to solve different kinds of problems. For example, the study in [180] proposed an improve BPSO based on Lévy flight as a local search component and inertia weight coefficient as a global search component as well as mutation mechanism for population diversity enhancement. The KNN classifier for the classification process and the Sigmoid function are implemented for solution mappings. Sixteen classical datasets were used for validation. The findings showed promising performance compared to other benchmarking methods.
In another study [181], the authors used the BPSO to address the feature selection problem on input variables for intelligence joint moment prediction. Experimental data gathered from ten electromyography (EMG) data and six joints' angles were used for validation. ANN classifier is used for the classification process and the Sigmoid function is implemented for solution mappings. Findings showed that the proposed approach is able to reduce the number of input variables of five joint moments from 16 to less than 11.
In [159], the authors proposed a co-evolution binary particle swarm optimization with a multiple inertia weight strategy. The KNN classifier for the classification process and the Sigmoid transfer function were used to convert the search space into a binary one. Ten benchmark datasets collected from the UCI repository were used for validation and the proposed method was compared against four well-known feature selection methods. Findings demonstrated a competitive performance compared to other methods.
In [182], the BPSO was hybridized with differential evolution to solve feature selection issues in EMG signals classification. The EMG signals of ten healthy subjects obtained from a publicly accessible EMG database were used for validation. Discrete wavelet transform was applied to decompose signals into wavelet coefficients. The sigmoid transfer function was used to meet the nature of feature selection and the KNN classifier for the classification process. The performance of the proposed method was compared against four benchmarking feature selection methods. Findings demonstrated that the proposed method is beneficial for EMG signals classification. In the same domain of EMG signals classification, the work in [183] proposed a new personal best guide BPSO. The discrete wavelet transform decomposes a signal into multiresolution coefficients. The sigmoid transfer function was used to meet the nature of feature selection and the KNN classifier for the classification process.
Moreover, a study based on improved BPSO was proposed in [184] to address the feature selection problems in gene selection and cancer classification. This approach chooses a small dimensional set of prognostic genes to classify biological samples of binary and multi-class cancers using Naive-Bayes classifier and Sigmoid transfer function were used to meet the nature of feature selection. Eleven microarray datasets of different cancer types were used for validation. Experimental results were benchmarked with seven other well-known methods and findings demonstrated a better result of the proposed method in terms of classification accuracy and the number of selected genes.
A hybrid improved PSO with a shuffled frog leaping algorithm was proposed in [185] to address the feature selection problem. Naive Bayes (NB), KNN, and Support Vector Machine (SVM) classifiers were used for classification and the Sigmoid function was applied. For validation, a dataset that consists of 1600 reviews of the 20 most popular Chicago hotels were used. The findings revealed that the proposed method attains an optimized feature subset and achieves higher classification accuracy. In another research [186], the authors proposed a new multiswarm heterogeneous BPSO using a Win-Win method to solve feature selection problems in liver and kidney disease diagnosis.
In [187], Hamming distance is introduced as a proximity measure that can update the binary PSO velocity and select the important feature subsets. Experimental results on three benchmark datasets are evaluated using classification accuracies and validity indices as well. Utilizing rough set theory and its distinction table as a binary table, the work in [188] proposed a hybrid binary PSO variant that implements a statistical elimination strategy that can help to reduce the number of features efficiently. The authors in [189] developed a new multi-swarm PSO variant as a feature optimization technique in facial recognition systems. Results have demonstrated that the new PSO variant can significantly outperform the standard PSO and GA. To achieve better accuracy, another study conducted by [190] proposed a fuzzy rule-based binary PSO (FRBPSO) that is designed specifically to solve feature selection problems. Results on benchmarking high dimensional microarray datasets show the merits of the proposed FRBPSO method.
Based on feature sub-set correlation, the authors in [191] proposed a hybrid PSO with a new local search strategy for feature selection. In the proposed approach, PSO is designed to select features that have low correlation. Results have shown that the proposed PSO achieves higher accuracy compared with filter methods. Utilizing the SVM classifier, PSO is hybridized with GA in [192] as a wrapper feature selection tool to classify microarray data. Considering unreliable data in feature selection and based on bare-bones PSO, a multi-objective PSO approach is proposed in [193]. The work in [194] hybridized PSO with GA to improve feature selection in Digital Mammogram datasets. Utilizing the KNN classifier, Al-Tashi et al [195] proposed a hybrid PSO with GWO for wrapper feature selection. This work has used the sigmoid transfer function for converting the search space into a binary one.
Multi-objective variants of PSO have been widely applied to solve feature selection problems. For example, the authors in [196] developed a PSO-based multi-objective approach where features are ranked based on their frequency in the set of archives. The proposed multi-objective scheme is compared with three multi-objective PSO variants as well a multiobjective GA on nine benchmark datasets. Results have shown that the proposed approach is more efficient in reducing the number of features in large datasets while it achieves a satisfactory performance that is close to the performance achieved by other algorithms when it deals with datasets that have lower than 100 attributes. Nonetheless, the proposed approach suffers from slow convergence that restricts reaching the optimum Pareto front. Another work that utilizes multiobjective PSO is presented in [197] where a two-step algorithm is proposed for fault diagnosis of power transformers. The first step is responsible to select the most important features where the second step generates an ensemble classifier that is formed from the most accurate classifiers. The work in [198] has developed a multi-objective PSO feature selection approach to predict the dose of warfarin. The authors in this work have applied artificial neural networks as a technique to assess the selected features. The developed multi-objective PSO approach is compared with NSGA-II and results have demonstrated that PSO outperforms NSGA-II in terms of accuracy and the minimum number of features selected.
An improved multi-objective version of PSO is developed in [199] to study multi-label feature selection. The authors implemented an adaptive uniform mutation operator to enhance the exploration abilities while a local learning strategy is used to achieve better exploitation. Results have shown that the proposed scheme performs better than NSGA-II in terms of exploration. Based on the filter approach, a multi-objective BPSO is proposed in [200] for feature selection to obtain a non-dominated feature subset that results in a reduction in the number of selected features as well as higher classification accuracy. The work presented in [201] developed two multi-objective algorithms based on PSO (NSPSOFS and CMDPSOFS) for solving feature selection problems. The NSPSOFS algorithm is developed based on the concept of nondominated sorting in NSGAII to check the possibility of implementing a simple multi-objective PSO to solve the problems of feature selection. The second algorithm known as CMDPSOFS utilizes three different techniques: mutation, dominance, and crowding. Testing the two algorithms on twelve classical UCI repository datasets, results have demonstrated the superiority of these two algorithms in reducing the number of features and decreasing the classification error rate when compared with NSGAII and the strength Pareto evolutionary algorithm 2 (SPEA2) [202] and Pareto archived evolutionary strategy (PAES) [203]. Table VI summarizes the existing studies on feature selection using PSO and its variants.
Since spectrum sensing in CRNs is a non-convex optimization problem, CRNs utilize PSO algorithms to optimize their performance in terms of energy efficiency, spectral efficiency and sensing time. The work in [227] has implemented the standard PSO algorithm to detect the presence of primary users. According to the simulation results, PSO can save more than 80x of energy consumption as well as sensing time. The performance of the proposed scheme can be further improved by applying enhanced PSO variants. A hybrid PSO-GSA approach is used in [123] to optimize energy efficiency in 5G CRNs. Results have shown that the proposed hybrid approach is more energy efficient than the standard PSO algorithm, Artificial Bee Colony (ABC), the energy detector scheme and the well-known cooperative spectrum sensing method.
In [213], PSO is used for beamforming optimization in IRSs to minimize the transmission power given that the signal-tonoise ratio (SNR) does not go below a certain threshold.
Results have demonstrated that PSO can achieve nearoptimal beamforming solutions. Considering vehicular adhoc networks, a task-distribution PSO is proposed to efficiently distribute tasks among vehicles that belong to the same cluster [228]. The results shows that the proposed PSO scheme outperforms GA in terms of overhead reduction while its overhead performance is comparable with linear programing. In [217], an adaptive PSO is developed to solve the clustering problem in ad-hoc networks.
One of the interesting and recent applications of PSO in the wireless domain is edge computing where intensive computational tasks are offloaded from core networks to the edge that is closer to the user. Considering a smart internet of things (IoT) system, a self-adaptive PSO algorithm that utilizes the GA operators (SPSO-GA) is recently proposed [214] to develop an energy-efficient approach that can efficiently make offloading decisions for deep neural networks (DNNs) layers with layer partition operations.
Simulation results have shown that the SPSO-GA algorithm outperforms the GA and PSO-GA approaches in terms of energy consumption. The authors in [215] applied a PSO-GA algorithm to minimize the system cost when DNN layers are offloaded over the cloud, edge and user's devices. According to the results, the proposed PSO-GA can significantly reduce the system cost compared with PSO and GA. Combining edge computing and cloud computing, the work in [216] implemented the BPSO algorithm with the GA operators to minimize data transmission time when the workflow is executed. Although the results have shown the superiority of the proposed approach in reducing data transmission time, data transmission energy is not considered. The works in [214][215][216] can be further improved by applying the recent high-performance PSO variants presented in Section III.
Considering a finite impulse response (FIR) filter, the authors in [229] applied a quantum-behaved PSO algorithm to develop an adaptive channel equalizer. Based on the results, the proposed approach achieves a lower bit error rate compared with GA, SPSO, and the classical least mean square method. The SPSO algorithm and its variants are also applied to design infinite impulse response (IIR) filters [220][221][222]. Another application of PSO in the wireless communications field is antenna array design. In a massive multiple-input multiple-output (MIMO) network, a contraction adaptive PSO algorithm is proposed in [223] to find the optimal positions of antenna array elements that can optimize an antenna's performance when it transmits or receives data. Although the proposed approach has shown good results, its performance is compared with PSO variants only. It is evident from the state-of-the-art presented in this subsection that most of the work has considered singleobjective optimization. In wireless communications, it is crucial to consider multiple objectives such as energy efficiency, spectral efficiency, and latency to develop a robust and reliable communication system. Therefore, it is essential to develop novel multi-objective PSO algorithms that takes several wireless metrics into account.

C. IMAGE PROCESSING
PSO has been successfully applied to solve many image processing optimization problems in diverse areas such as image segmentation [230][231][232], image enhancement [233], image compression [234] and image watermarking [235]. One of the recent and interesting applications of PSO in image processing is multilevel thresholding image segmentation. The work in [230] modified the standard PSO algorithm to perform image threshold segmentation in lung CT images where the aim is to identify lung tissue. In the proposed scheme, the symmetric disposition is implemented to adjust the positions of particles in each iteration. Although this work has shown fast segmentation speed as well as good segmentation accuracy, its performance is tested only on one lung CT image. In [231], PSO is used to segment medical images to detect brain tumors. This work can be further improved by applying recent robust PSO variants or other well-known meta-heuristic algorithms such as GWO and Equilibrium Optimizer (EO). Considering multilevel image thresholding, the work in [232] hybridized PSO with the firefly algorithm (FA) [236] to search for the optimal threshold values. Based on the results, the proposed hybrid scheme outperforms GA, PSO, and FA in terms of peaksignal-to-noise-ratio (PSNR).
In the area of image enhancement, PSO is utilized in [233] to address the inaccurate nature of retinal images. The effectiveness of the proposed approach is validated on two well-known image datasets and results have shown that PSO can significantly enhance the quality of retinal images compared with GA, ACO, and ABC. The authors in [234] have proposed to use PSO with Haar Wavelet Transform to compress medial images. The simulation results have shown that the proposed scheme can achieve high PSNR which indicates that the quality of compressed images is close the quality of original images. In [235], PSO is used with an intertwining logistic map to develop a blind watermarking scheme. Testing the proposed method on eight classical grayscale images, results have shown that PSO can efficiently optimize watermark embedding strength.

D. ELECTRICAL POWER SYSTEMS
PSO has been widely applied to optimize the performance of electrical power systems including economic dispatch [237][238][239], optimal power flow [240][241][242], state estimation [243], power system controllers [244,245], unit commitment [246] and capacitor placement [247]. A detailed and thorough survey on the applications of PSO in electrical power systems is provided in [248]. The survey focused on ten areas including optimal power flow, economic dispatch, reactive power dispatch, and maintenance scheduling. Recently, the works in [249,250] have provided a comprehensive review of the applications of PSO and its variants on the economic dispatch problem. The authors in [246] have recently reviewed the state-of-the-art applications of PSO to solve the unit commitment problems. According to [246], most of the PSO-based unit commitment research has considered singleobjective optimization that minimizes cost only while multiobjective optimization that jointly minimizes cost and emission is not well studied yet. The work in [247] applied a hybrid ABC-PSO algorithm in an IEEE 34-node and 69-node radial distribution networks to find the best capacitor placement and size that can help to reduce power loss. The proposed technique can achieve lower power loss when compared with the individual PSO and ABC algorithms; nevertheless, its performance is not compared with high-performance PSO variants as well as with other well-known meta-heuristic algorithms such as GWO and whale optimization algorithm (WOA). The work in [243] hybridized PSO with gravitational search algorithm to solve the problem of state estimation in distribution systems. Results have shown that the proposed hybrid scheme is more accurate and reliable than the standard PSO algorithm and the original GSA approach.   [204], immune cooperative PSO [205], extended binary PSO [206], chaotic PSO [207] Binary PSO [208], adaptive discrete PSO [209] , binary quantum elite PSO [210], improved PSO [211], two-phase PSO [212] Adaptive PSO [217], multi-objective PSO [218], [219] FIR digital filter design IIR digital filter design Estimation of nonstationary signals Antenna array design

VII. PSO DRAWBACKS
Despite the excellent performance of its variants, PSO suffers, in general, from some weaknesses that can be alleviated by introducing new modifications to the current PSO variants. The literature reports several concerns about PSO performance which can be outlined as follows:

A. PREMATURE CONVERGENCE
One of the major performance problems of PSO is premature convergence as pointed out in [40,283]. This problem occurs due to the lack of population diversity especially in complex multimodal functions [40]. The work in [37,71,284] presented important PSO variants that have shown remarkable performance in terms of avoiding premature convergence. Nevertheless, much more research is needed to address this problem.

B. THE DIFFICULTY OF CONTROLING THE PSO PARAMETERS
Although there are only three parameters ( 1 , 2 , ) to be controlled in PSO, it is difficult to control these parameters and find their appropriate setting at each iteration. Despite the extensive efforts of proposing several methods to control 1 , 2 and none of these methods guarantee that the optimal setting of 1 , 2 and can be achieved.

C. IMPROPER VELOCITY ADJUSTMENT
The improper velocity adjustment occurs when inappropriate values of 1 , 2 and are chosen. This makes the particles fly in undesired directions, causing stagnation around or near the optimum solution [285].

VIII. POTENTIAL RESEARCH DIRECTIONS
Although PSO variants have shown promising results in solving optimization problems, PSO can still be developed further to improve its performance when applied to solve complex real-world optimization problems. The following provides some potential future directions to be considered by researchers who are interested in PSO and its applications: 1) The original PSO and its recent variants presented in Section III can be hybridized with other recent highperformance metaheuristic algorithms such as Equilibrium optimizer (EO) [286], Marine Predators Algorithm (MPA) [287], Gradient-based optimizer (GBO) [288], Political Optimizer (PO) [289], The Arithmetic Optimization Algorithm (AOA) [290], and Archimedes optimization algorithm [291].
2) The recent PSO variants presented in Section III can be converted into binary PSO algorithms and utilized to solve binary problems such as feature selection and the 0-1 knapsack problem.
3) Some of the binary PSO variants presented in Section IV are applied to feature selection only while others are applied to solve the 0-1 knapsack problem. It would be interesting to apply each recent binary PSO variant to solve both problems and evaluate the performance. 4) Some of the binary transfer functions presented in Table  IV have not been investigated and could be utilized to test the performance of PSO using such transfer functions.
5) The performance of PSO variants on high-dimensional problems is not well studied yet. The performance of recent PSO variants on high-dimensional problems can be investigated. In addition, Further work is needed to develop new PSO variants that can perform well on low and highdimensional problems.
6) The development of new PSO variants that can solve multi-objective problems is a promising research direction to be considered. 7) Recent PSO variants can be applied to solve a wide range of real-world optimization problems such as data clustering [292], maintenance scheduling [293], lot-sizing optimization [294,295], supply-chain network optimization [296,297]. 8) One promising research direction is to hybridize wrapper approaches that implements PSO variants with filter methods to solve feature selection problems.

IX. CONCLUSION
PSO is a simple, robust, and fast optimizer that can solve complex real-world optimization problems. To overcome the limitations of the standard PSO, extensive research efforts have been exerted to modify the original PSO algorithm into better variants by applying several methods including controlling the PSO parameters, hybridizing PSO with other searching algorithms, and using multi-swarm techniques. This work presents an overall review of the distinct research works that have been conducted on PSO. The review starts by explaining the basic concepts of PSO. Then, it describes the different topologies that can be used in PSO, provides a comprehensive review of the recent and historical prominent PSO variants. The review also includes PSO in binary presentation, remarkable engineering applications of PSO, and drawbacks of PSO. More specifically, this review paper has focused on PSO-based feature selection. Finally, this work provides some potential research directions that can help researchers further enhance the performance of PSO. In a nutshell, there are still rooms for improvement in PSO development to provide better performance when applied to complex high-dimensional real-world optimization problems.