PSO-X: A Component-Based Framework for the Automatic Design of Particle Swarm Optimization Algorithms

—The particle swarm optimization (PSO) algorithm has been the object of many studies and modiﬁcations for more than 25 years. Ranging from small reﬁnements to the incorporation of sophisticated novel ideas, the majority of modiﬁcations proposed to this algorithm have been the result of a manual process in which developers try new designs based on their own knowledge and expertise. However, manually introducing changes is very time consuming and makes the systematic exploration of all the possible algorithm conﬁgurations a difﬁcult process. In this article, we propose to use automatic design to overcome the limitations of having to manually ﬁnd performing PSO algorithms. We develop a ﬂexible software framework for PSO, called PSO-X, which is speciﬁcally designed to integrate the use of automatic conﬁguration tools into the process of generating PSO algorithms. Our framework embodies a large number of algo-rithm components developed over more than 25 years of research that have allowed PSO to deal with a large variety of problems, and uses irace , a state-of-the-art conﬁguration tool, to automa-tize the task of selecting and conﬁguring PSO algorithms starting from these components. We show that irace is capable of ﬁnding high-performing instances of PSO algorithms never proposed before.


I. INTRODUCTION
C OMPUTATIONAL intelligence algorithms, such as particle swarm optimization (PSO) and evolutionary algorithms (EAs), are widely used to tackle complex optimization problems for which exact approaches are often impractical [1], [2]. The application of these algorithms has been shown to be instrumental in a growing number of areas where efficiently using resources, obtaining a higher degree of automation, or finding support in decision making are needed on a regular basis. While the application of computational intelligence algorithms usually seeks higher efficiency and automation, their development is, on the contrary, mostly done following a manual approach based on the intuition and expertise of the developers [3]. The manual development of such algorithms presents a number of drawbacks: it is a slow process based on trial and error; it limits the number of design alternatives that an implementation designer can explore; and it does not provide a principled way to explore the space of possible algorithms, thus making the development process difficult to reproduce.
To alleviate these issues, it has been recently proposed a development framework based on components [5], [6], which includes automatic configuration tools for creating high-performing algorithms. As opposed to manual approaches, where algorithms are typically seen as monolithic blocks with a few numerical parameters whose design is modified based on the experience and knowledge of the algorithm designer, in a component-based approach [3] algorithms are seen as a particular combination of algorithm components. The design of algorithms using a component-based approach relies on three key elements: 1) a software framework from which algorithm components can be selected; 2) a set of rules indicating a coherent way to combine the components in the software framework; and 3) the use of an automatic configuration tool to evaluate the performance of different designs and parameter settings.
Compared to the number of works devoted to other widely used algorithms, such as ant colony optimization [5] and artificial bee colony [7], there are very few previous work attempting the automatic design of PSO algorithms. The two most relevant of these works are [8] and [9], where the authors used grammatical evolution (GE) to evolve novel velocity update rules in PSO. The main limitation of these works is the low number of different components that can be combined. In [8], only the social and cognitive components of the velocity update rule can be automatically designed; in [9], the list of components includes also the topology and swarm size, but the grammar that defines the rules to combine components is based on the standard version of PSO and makes difficult to include recent algorithm components.
To overcome these limitations, in this article, we propose PSO-X , a flexible, component-based framework containing a large number of algorithm components previously proposed in the PSO literature. In PSO-X , each algorithm component can assume a set of different values and PSO-X generates a specific PSO algorithm by selecting a value for each possible component. To do so, PSO-X uses a generalized PSO template that is flexible enough to combine the algorithm components in many different ways, and that is sufficient to synthesize many wellknown PSO variants published in the last two decades. Most of such flexibility is achieved through the use of a generalized This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ velocity update rule (GVUR)-the core component of PSO. The goal of using a generalized velocity rule is to facilitate the abstraction of the elements typically used in this algorithm component in order to allow the combination of concepts that operate at different levels of the algorithm design. For example, with our template and the GVUR, a high-level component, such as the type of distribution of all next possible particle positions, can interact with specific types of perturbation and a number of strategies to compute their magnitude.
PSO-X provides two important benefits when implementing PSO algorithms: first, the possibility of easily creating many different implementations combining a wide variety of algorithm components from a single framework; second, the possibility of using automatic configuration tools to tailor implementations of PSO to specific problems according to different scenarios. Here, we aim at showing that developing PSO algorithms using PSO-X is more efficient and produces implementations capable of outperforming manually designed PSO algorithms. To assess the effectiveness of our PSO-X framework, we compare the performance of six automatically generated PSO implementations with ten of the best-known variants proposed in the literature over a set of 50 benchmark problems for evaluating continuous optimizers.
The remainder of this article is structured as follows. In Section II, we present a review of the main concepts of PSO and of the mechanism used by irace to automatically create high-performing implementations. In Section III, we identify particular design choices proposed since the earliest PSO publication and discuss their functional purpose in the implementation of the algorithm. In Section IV, we introduce our PSO algorithm template and, from Sections IV-A-IV-E, we explain how it can be used to create PSO variants. The experimental procedure we followed is explained in Section V. The results of the experiments conducted to evaluate the performance of the PSO-X algorithms are presented in Section VI. In Section VII, we conclude this article and mention future research directions.

II. PRELIMINARIES A. Continuous Optimization Problems
In this article, we consider the application of PSO to continuous optimization problems. Without loss of generality, we consider minimization problems where the goal is to minimize a d-dimensional continuous objective function The search space S is a subset of R d in which a solution is represented by a real-valued vector x, and each component x j of x is constrained by a lower and upper bound such that lb j ≤ x j ≤ ub j , for j = 1, . . . , d. The vector o represents the solution for which the evaluation function f (·) returns the minimum value.

B. Particle Swarm Optimization
Particle swarm optimization [10] is a stochastic search algorithm where a set of "particles" search for approximate solutions of continuous optimization problems. In PSO, each particle moves in the search space by repeatedly applying velocity and position update rules. Each particle i has, at every iteration t, three associated vectors: 1) the position x i t ; 2) the velocity v i t ; and 3) the personal best position p i t . The vector x i t is a candidate solution to the optimization problem considered whose quality is evaluated by the objective function f (·).
In addition to these vectors, each particle i has a set N i of neighbors and a set I i of informants. Set N i contains the particles from which i can obtain information, whereas I i ⊆ N i contains the particles that will indeed provide the information used when updating i's velocity. The way the sets N i -which define the topology of the swarm [11]-and the sets I i -which we refer to as models of influence-are defined are two important design choices in PSO. Sets N i can be defined in many different ways producing a large number of possible different topologies; the two extreme cases are the fully connected topology, in which all particles are in the neighborhood of all other particles, and the ring topology, where each particle is a neighbor of just two adjacent particles. Examples of other partially connected topologies include lattices, wheels, random edges, etc. The model of influence can also be defined in different ways, but the vast majority of implementations employ either the best-of-neighborhood which contains the particle with the best personal best solution in the neighborhood of i (which includes particle i itself), or the fully informed model, In the standard PSO (StaPSO) [12], the rule used to update particles' position is where the velocity vector v i t+1 of the ith particle at iteration t + 1 is computed using an update rule that involves v i t , p i t , and l i t . The vector l i t indicates the best among the personal best positions of the particles in the neighborhood of i; formally, it is equal to p k t where k = arg min j∈N i {f ( p j t )}. Note that when a fully connected topology is employed, vector l i t becomes the global best solution and is indicated as g t .
The velocity update rule of StaPSO is defined as follows: where ω is a parameter, called inertia weight, used to control the influence of the previous velocity, and ϕ 1 and ϕ 2 are two parameters known as the acceleration coefficients (ACs) that control the influence of ( , respectively, known as the cognitive influence (CI) and the social influence (SI), is to attract particles toward high-quality positions found so far. U i 1t and U i 2t are two d×d diagonal matrices whose diagonal values are random values drawn from U (0, 1]; their function is to induce perturbation to the CI and SI vectors.
The rule to update the personal best position of particle i is (3)

C. Automatic Algorithm Configuration
We employed a state-of-the-art offline configuration tool called irace [13]. This tool has been shown to be capable of dealing with the task of selecting, configuring, and generating high-performing algorithms by finding good algorithm configurations whose performance can be generalized to unseen problem instances. To do so, irace implements a procedure called iterated racing [13], which is based on the machine learning model selection approach called racing [14] and on Friedman's nonparametric two-way analysis of variance. Iterated racing consists of the following steps. First, it samples candidate configurations from the parameter space. Second, it evaluates the candidate configurations on a set of instances by means of races, whereby each candidate configuration is run on one instance at a time. Third, it discards the statistically worse candidate configurations identified using a statistical test based on Friedman's nonparametric twoway analysis of variance by ranks. During the configuration process, which is done sequentially and uses a given computational budget, irace adjusts the sampling distribution in order to bias new samplings toward the best configurations found so far. When the computational budget is over, irace returns the configuration that performed best over the set of training instances. irace is capable of handling the different types of parameters included in our framework, that is, numerical (e.g., ω, ϕ 1 , or ϕ 2 ), categorical (e.g., topology), and subordinate parameters, that is, parameters that are only necessary for particular values of other parameters (e.g., when the size of the population changes in the implementation, it is necessary to configure the maximum and minimum number of particles in the swarm, but not when the size remains constant).

III. DESIGN CHOICES IN PSO
Many algorithm components have been proposed for PSO over the years [15], [16] with the goal of improving its performance and enabling its application to a wider variety of problems. In this article, we have categorized these algorithm components into five different groups: 1) those used to set the value of the main algorithm parameters; 2) those that control the distribution of particles positions in the search space; 3) those used to apply perturbation to the velocity and/or position vectors; 4) those regarding the construction and application of the random matrices; and 5) those related to the topology, model of influence, and population size.
Group 1) comprises the time-varying and adaptive/selfadaptive parameter control strategies used to compute the value of ω, ϕ 1 , and ϕ 2 . Time-varying strategies take place at specific iterations of the algorithm execution; while adaptive and self-adaptive strategies use information related to the optimization process (e.g., particles average velocity, convergence state of the algorithm, average quality of the solutions found, etc.) to adjust the value of the parameters. Because the value of ω, ϕ 1 , and ϕ 2 heavily influences the exploration/exploitation behavior of the algorithm, parameter control strategies are abundant in the PSO literature [17]. In particular, a lot of attention has been given to control strategies focused on adjusting the value of ω, which is intrinsically related to the local convergence of the algorithm. Locally convergent implementations not only guarantee to find a local optimum in the search space but also prevent issues, such as 1) swarm explosion, which happens when a particle's velocity vector grows too large and the particle becomes incapable of converging to a point in the search space [19] and 2) poor problem scalability, which means that the algorithm performs poorly on high-dimensional problems [20]. In fact, the poor problem scalability issue has become very relevant in the last years because of the increasing number of problems involving large-dimensional spaces where PSO is applicable. It has been observed that unwanted particles roaming in high-dimensional spaces is a substantial part of this issue and that, in variants, such as StaPSO, parameter values for ω, ϕ 1 , and ϕ 2 that perform well in low-dimensional spaces will most likely perform poorly in large dimensional ones [21]. A number of strategies have been proposed to address this issue, such as reinitialization [22], group-based random diagonal matrices [23], and perturbation mechanisms [20].
In group 2) are the algorithm components used to control the distribution of all next possible positions (DNPPs) of the particles. The chosen DNPP determines the way particles are mapped from their current position to the next one. We consider the three main DNPP proposed in the literature-the rectangular (used in StaPSO), the spherical (used in standard PSO 2011 (SPSO11) [24]), and the additive stochastic, which comprises the recombination operators proposed for simple dynamic PSO algorithms [25]. Although some DNPP mappings suffer from transformation variance-which happens when the algorithm performs poorly under mathematical transformations of the objective function, such as scale, translation, and rotation-there are a number of algorithm components that have been developed to prevent this issue.
Group 3) is composed of the algorithm components that allow to apply perturbations to the particles velocity/position vectors. In general, in PSO, perturbation mechanisms can be informed or random. Informed perturbation mechanisms receive a position vector as an input (typically p i t or x i t ) and use it to compute a new vector that replaces the one that was received. The typical way in which informed mechanisms work is by using the components of the input vector as the center of a probability distribution and mapping random values around them; however, other options found in the literature include computing the Hadamard product between the input vector and a random one, or randomly modifying the components of the input vector. Differently, random perturbation mechanisms add a random value to a particle's position or velocity. Perturbation mechanisms proposed for PSO are used to improve the diversity of the solutions [20], [26], avoid stagnation [27], and avoid divergence [28]. Additionally, some of these mechanisms allow to modify the DNPP of the particles; an example is the mechanism proposed in [20], where a Gaussian distribution is used to map random points on spherical surfaces centered around the position of the informants.
One of the main challenges in most perturbation mechanisms is the determination of the perturbation magnitude (PM): a strong perturbation may prevent particles from efficiently exploiting high-quality areas of the search space, while a weak one may not produce any improvement at all. In order to allow convergent implementations to take advantage of the perturbation mechanism, some magnitude control strategies take into account the state of the optimization process to adjust the magnitude at run time. An example is [28], where a parameter decreases the PM when the best solution found so far has been constantly improving, whereas increases it when the algorithm is stagnating. Another example is [20], where the magnitude is computed based on the Euclidean distance between the particles so as to decrease it as particles converge to the best solution found so far.
The algorithm components in group 4) corresponds to the random matrices, whose function, similarly to some perturbation mechanism of group 3), is to provide diversity to particles movement. The main difference between the random matrices and the perturbation mechanisms described above is that the former can be used to produce changes in the magnitude and direction of the CI and SI vectors, while the latter allows only to apply perturbation to individual positions used in the computation of the CI and SI. In the StaPSO algorithm, the random matrices [U i 1 and U i 2 , see (2)] are usually constructed as diagonal matrices with values drawn from U (0, 1); however, in some implementations of StaPSO (e.g., [16]), the matrices are replaced by two random values r i 1 and r i 2 -in this case, particles oscillate linearly between p i t and l i t without being able to move in different directions, preventing transformation variance [16] but affecting the performance of the algorithm. Using random rotation matrices (RRMs), instead of random diagonal matrices, is another way to address transformation variance in PSO. RRMs allow to apply random changes to the length and direction of the vectors in the velocity update rule without being biased toward some particular reference frame. The two main methods that have been used to create RRMs in the context of PSO are exponential map [29] and Euclidean rotation [30].
The last group of algorithm components we identified in our work, group 5), includes the topology, model of influence, and population size. The topology plays an important role in the way the algorithm will modulate its exploration-exploitation capabilities. In addition to the well-known fully connected, ring, and von Neumann topologies, there are other topologies that have been explored in the PSO literature, such as the hierarchical and small-world network. In [31], a topology that decreases connectivity over time was proposed. Concerning the model of influence, besides the best-of-neighborhood and the fully informed, another option is the ranked fully informed model of influence [32], in which the contribution of each informant is weighted according to its rank in the neighborhood. Concerning population size, it has recently been proposed to increase or decrease the number of particles according to some metrics [33], [34]. The number of particles in the swarm has an impact on the tradeoff between solution quality and speed of the algorithm [16], [34]. In general, a large population should be used as it can produce better results. However, a small population may be the best option when the objective function evaluation (FE) is expensive or when the number of possible FEs is limited.
IV. DESIGNING PSO ALGORITHMS FROM ALGORITHM TEMPLATE In this section, we explain the way in which the algorithm components reviewed in the previous section can be combined using the PSO-X framework. In the reminder of this article, we use Sans Serif font to indicate the name of the algorithm components and of their options as implemented in PSO-X .

5:
apply velocity clamping %optional 6: A. Algorithm Template for Designing PSO Implementations Algorithm 1 depicts the PSO-X 's algorithm template. A swarm of particles (swarm) is created using the INITIALIZE() procedure that assigns to each particle a set N i , a set I i , an initial position, and an initial velocity based on the Population, Topology, and Model of influence indicated by the framework user. Additionally, the INITIALIZE() procedure creates and initializes any variable required to use the algorithm components included in the implementation. The two for cycles of lines 3-7 and lines 8-11 correspond to the standard implementation of PSO-except for line 4 that shows our GVUR, defined as follows: where DNPP represents the type of mapping from a particle's current position to the next one, and Pert rand represents an additive perturbation mechanism. The parameter ω 1 is the same as the inertia weight in StaPSO [see (2)] and its value can be computed using the strategies that have been developed for this purpose (the list of the strategies available to compute its value are shown in Table 4 of the supplementary material [4]). The parameters ω 2 and ω 3 control the influence that will be given to the DNPP and Pert rand components; their values can be set equal to ω 1 or be computed using the random component, where ω 2 and ω 3 are user selected constants in the interval [0, 1]. We use three independent ω parameters so that it is easy to disable any of the GVUR components. For example, a velocity-free PSO can be easily obtained by setting ω 1 = 0. After all particles have updated their position, two procedures can take place: 1) UPDATEPOPULATION(), that increases/decreases the size of the swarm according to the type of Population employed and 2) UPDATETOPOLOGY(), that connects newly added particles to a set of neighbors, or disconnect particles when the topology connectivity reduces over time.
All implementations that can be created using Algorithm 1 and combining algorithm components with different functionalities, such as different DNPPs or topologies, are considered valid implementations by PSO-X . This allows enough flexibility to explore many new designs without increasing too much the computational complexity of the implementations.
In particular, in PSO-X , we do not allow the recursive use of components, as it is done in the component-based framework using grammars [3]. In Table 2 of the supplementary material [4], we show the algorithm components and parameters in the framework, their domain and type, and the condition(s) under which each parameter is used in PSO-X .
The DNPP-rectangular option is defined as follows: where Mtx and Pert info are, as mentioned before, high-level representations of the different types of random matrices and informed perturbation mechanisms used in PSO. The DNPP-rectangular is by far the most commonly used in implementations of PSO, including StaPSO [12], the constriction coefficient PSO [19], the fully informed PSO (FiPSO) [35], etc. In the standard application of DNPP-rectangular (i.e., as in StaPSO), each term added in (5) is a vector located on a hyperrectangular surface whose side length depends on the distance between p k t and x i t . However, when the perturbation component of DNPP-rectangular is an informed Gaussian-as in the locally convergent rotationally invariant PSO (LcRPSO) [20]-or an RRM-as in the diverse rotationally invariant PSO (DvRPSO) [29]-the surface on which the different vectors computed in (5) are located becomes hyperspherical or semi-hyperspherical, respectively.
Another option is DNPP-spherical [24], where a vector located on a hypersphere is used in the computation of a particle new position. The equation to compute the DNPP-spherical option is as follows: The center c i t is computed as follows: where and ϕ k 2t = (ϕ 2t /|I i t \ {i}|). The main difference between DNPP-spherical and the standard implementation of DNPPrectangular is that the hypersphere is invariant to rotation around its center, whereas DNPP-rectangular is rotation variant unless another component is used to overcome this issue-e.g., a Gaussian perturbation, as done in the LcRPSO variant. While the DNPP-spherical and the LcRPSO combining the DNPP-rectangular with a Gaussian perturbation component use the same idea, they work in a different way. In the DNPP-spherical DNPP, there is a single vector mapped randomly in the hypersphere H( c i t , | c i t − x i t |) and the informants of i participate only in the computation of vector L i t [see (9)]; 1 whereas in the LcRPSO variant, there are n different vectors, one for each informant of i, each mapped on a spherical surface, and the new velocity of the particle is obtained by adding all n vector, as shown in (5).
The DNPP-standard, DNPP-discrete, DNPP-Gaussian, and DNPP-Cauchy-Gaussian options belong to the class of simple dynamic PSO algorithms [25], [36] and have the form DNPP-Cauchy-Gaussian: where η d ∼ U {0, 1} is a discrete random number drawn from a Bernoulli distribution, C(1) is a random number generated using a Cauchy distribution with scaling parameter 1, N (0, 1) is a random number from a Normal distribution with mean 0 and variance 1, and r is a parameter that allows the user to select the probability with which the Cauchy or the Normal distributions are used in (13). Vectors p i t and p k t are computed using p i t = Pert info ( p i t ) and p k t = Pert info ( p k t ) with k ∈ I i . Unlike options DNPP-standard, DNPP-discrete, and DNPP-Gaussian, where the mapping between particles i and k is deterministic, in DNPP-Cauchy-Gaussian, the value of the jth dimension of q i t is computed with probability r using p i t and a Cauchy distribution; and with probability 1 − r using p k t and a Normal distribution.
Although we kept the original definition of these DNPPs for the most part, we did two modifications: we included the Pert info component (i.e., vectors p i t and p k t instead of p i t and p k t ) and the possibility of using a random informant model of influence (MoI-random informant), which consists in choosing a random particle from N i and use it as informant.

C. Pert rand and Pert info Components
The two types of perturbation components included in PSO-X are: 1) Pert info , which modifies an input vector and 2) Pert rand , which generates a random vector that is added to the velocity vector. Pert info , as explained in Section IV-B, is a component used by the DNPP component. Differently, Pert rand is used directly in the GVUR.
As shown in Table I, both Pert info and Pert rand are optional components in PSO-X that can be omitted from the implementation using the none option. The options for Pert info , when the component is present in the implementation, 1 In the original definition of (9), vector L i t was defined considering a best-of-neighborhood model of influence. In this article, we have extended the computation of L i t to an arbitrary number of informants. are Pert info -Gaussian, Pert info -Lévy, and Pert info -uniform. Pert info -Gaussian and Pert info -Lévy compute a random vector by using a probability distribution whose center and dispersion are given by the input vector r and by the parameter σ t that controls the magnitude of the perturbation. Similarly, in Pert info -uniform, the PM depends on a parameter b t , that controls the interval in which a random vector s will be generated using a uniform distribution. Regarding the Pert rand component, both Pert rand -rectangular and Pert rand -noisy employ a random uniform distribution to generate a random vector; the magnitude of the perturbation is controlled in this case by parameters τ t and δ t , respectively. In Pert info -Lévy, the value of γ t can be used to switch between a Gaussian and a Cauchy distribution [37]. That is, when γ t = 1, the Lévy distribution is equivalent to the Gaussian distribution, and when γ t = 2, it is equivalent to the Cauchy distribution. In PSO-X , the value of γ t is obtained sampling from the discrete uniform distribution U {10, 20} This allows to vary the probability of generating a random value in the tail of the distribution. This way of computing the value of γ t is similar to the one used in [36] for computing the DNPP-Cauchy-Gaussian option assuming r = 0.5 to give the same probability to each case [see (13)].
Since the PM plays a critical role in the effectiveness of perturbation components, setting its value (either offline or during the algorithm execution) is often challenging. In PSO-X , we implemented four strategies for computing the PM that can be used with any of the Pert info and Pert rand components. These strategies are PM-constant value, PM-Euclidean distance, PM-obj.func. distance, and PM-success rate.
The PM-constant value strategy [26] is the simplest and consists in using a value that remains constant during the execution of the algorithm. This strategy guarantees that the PM is always greater than zero-a condition that has to be verified for all perturbation strategies. However, the main problem with the PM-constant value strategy is that using the same value may not be effective for the different stages of the optimization process. For example, particles that are farther away from the global best solution may benefit from a large PM value in order to move to higher quality areas, while for those particles that are near the global best solution, a small PM value would make exploitation easier.
The PM-Euclidean distance strategy [20] consists in using the Euclidean distance between the current position of particle i and the personal best of a neighbor k. This strategy is defined as follows: where 0 < ≤ 1 is a parameter used to weigh the distance between x i t and p k t . The PM-obj.func. distance is very similar to the PM-Euclidean distance, but the distance between particles is measured in terms of the quality of the solutions. The equation to compute the PM using PM-obj.func. distance is where 0 < m ≤ 1 is a parameter. For particles whose quality is very similar to that of the local best, the PM will be small, enhancing exploitation; and for those whose quality is poor compared to that of the local best, the PM will be large allowing them move to far areas of the search space. The mechanism implemented in PM-success rate [28] to compute the PM takes into account the success rate of the algorithm in terms of improving the best solution's quality. The value of the PM is adjusted depending on the number of consecutive iterations in which the swarm has succeeded (#successes) or failed (#failures) to improve the best solution found so far, where iteration t → t + 1 is a success if f ( g t+1 ) < f ( g t ), a failure otherwise. The PM-success rate strategy is defined as follows: where the threshold parameters s c and f c are user defined.

D. Mtx Component
The options for the Mtx algorithm component in PSO-X are Mtx-random diagonal, Mtx-random linear, Mtx-exponential map, and Mtx-Euclidean rotation. The Mtx-random diagonal and Mtx-random linear options are both d×d diagonal matrices whose values are drawn from a U (0, 1); the only difference between them is that, in Mtx-random linear, one random value is repeated d times in the matrix diagonal, whereas, in Mtx-random diagonal, the matrix contains d independently sampled values.
The Mtx-exponential map [29] option is based on an approximation method called exponential map whereby RRMs can be constructed avoiding matrix multiplication, which is computationally expensive. Mtx-exponential map is defined as where I is the identity matrix, α is a scalar representing the rotation angle, and A is an n × n random matrix with uniform random numbers in [−0.5, 0.5]. To keep the computational complexity low, we set max β = 1.
The Mtx-Euclidean rotation [30] rotates a vector in any combination of planes. 2 An Mtx-Euclidean rotation for rotating axis x i in the direction of x j by the angle α is given by a matrix [r mn ] with r ii = r jj = cos α, r ij = − sin α, and r ji = sin α, and the remaining values are set to 1 if they are on the diagonal or to zero otherwise. Since [r mn ] is an identity matrix except for the entries at the intersections between rows i and j and columns i and j, the multiplication between [r mn ] and v is done as follows: where v k indicates the kth entry of vector v. We use Mtx-Euclidean rotation all to indicate when Mtx-Euclidean rotation is used to rotate a vector in all possible combination of planes, and Mtx-Euclidean rotation one to indicate when it is used to rotate in only one plane. The strategies to compute the rotation angle are α-constant, α-Gaussian, and α-adaptive. In α-constant, the value of α is defined by the user, whereas in α-Gaussian and α-adaptive, it is obtained by sampling values from N (0, σ ). The value of σ when the Gaussian distribution is used can be a user defined parameter, as in α-Gaussian, or be computed using an adaptive approach, as in α-adaptive, which is defined as follows: where ζ and ρ are two parameters and ir t is the number of improved particles in the last iteration divided by the population size. The last option for the Mtx component is Mtx-Increasing group-based [23] that divides a random diagonal matrix into g t groups and every element in each group has the same value, generated uniformly random. The number of groups at each iteration is computed using the following equation: where d is the number of problem dimensions and t max is the iteration number at which the algorithm stops. Note the algorithm starts with g t = 1 and the number increases over time until there are g t = d groups, which is equivalent to gradually transforming an Mtx-random linear component into a Mtx-random diagonal one.

E. Topology, Model of influence and Population Components
In addition to the well-known options for the Topology component discussed in Sections II-B and III and showed in  In Top-hierarchical [38], particles are arranged in a regular tree-i.e., a tree graph with a maximum branching degree (bd) and height (h)-where they move up and down based on the quality of their p t vector, and sets N i contain only the particles that are in the same branch of the tree as particle i but in a higher position. The topology is updated at the end of each iteration starting from the root node and consists of each particle comparing the quality of its p t vector with that of its parent and switching places when it has higher quality.
The Top-time-varying [31] is a topology that reduces its connectivity over time: it starts as a fully connected topology and every κ iterations a number of edges is randomly removed from the graph until the topology is transformed into a ring. The value of κ, which controls the velocity at which the topology is transformed, is a multiple of the number of particles in the swarm, so that the larger the value of κ the faster the topology will be disconnected. Additionally, the number of edges to be removed follows an arithmetic regression pattern of the form n − 2, n − 3, . . . , 2, where n is the swarm size.
The options for the Model of influence component are MoIbest-of-neighborhood, where sets I i contains i and the local best particle in the neighborhood of i; the MoI-fully informed, where sets I i = N i ; MoI-ranked fully informed, which is similar to the MoI-fully informed, but particles in I i are ranked according to their quality so that the influence of a particle with rank r is twice the influence of a particle with rank r − 1; and MoI-random informant, which allows particles to select a random neighbor from N i to form set I i .
The options for the Population component are Popconstant, Pop-time-varying, and Pop-incremental. In Poptime-varying [33], there is a maximum (pop max ) and minimum (pop min ) number of particles that can be in the swarm at any given time. Particles are added or removed according to two criteria: 1) add one particle if the best solution found has not improved in the previous k consecutive iterations and the swarm size is smaller than pop max and 2) remove the particle with the lowest quality if the best solution found has improved in the previous k consecutive iterations and the swarm size is larger than pop min . Whenever criterion 1) is verified, but the swarm size is equal to pop max , the particle with the lowest quality is removed before adding the new random particle.
In Pop-incremental [34], the algorithm starts with an initial number of particles (pop ini ) and, at each iteration, there are ξ new particles added to the swarm until a maximum number is reached (pop fin ).
The initial position of newly added particles (x new ) can be computed using Init-random or Init-horizontal. In Init-random where lb j and ub j are the lower and upper bound of the jth dimension of the search space. In Init-horizontal, a horizontal learning approach is applied to x new,j after it has been randomly initialized in the search space Using a dynamic population requires that the topology is updated in order to assign newly added particles to a neighborhood or to reconnect particles that were connected to a particle that was removed. This is handled as follows.
1) Particles Are Added to a Fixed Topology: The topology is extended by connecting a newly added particle with a set of neighbors randomly chosen. In Top-hierarchical, new particles are always placed at the bottom of the tree.

2) Particles Are Added to a Time-Varying Topology:
We assignĈ i t neighbors to every new particle, whereĈ i t is the average number of neighbors that every particle in the swarm has at iteration t.

3) Particles Are Removed:
The topology is repaired to ensure that every particle has the right number of neighbors.

F. Acceleration Coefficients
The four strategies that can be used to compute the ACs in PSO-X are: 1) AC-constant; 2) AC-random; 3) AC-timevarying; and 4) AC-extrapolated. In AC-random, the value of ϕ 1t and ϕ 2t is drawn from U [ϕ min , ϕ max ], where 0 ≤ ϕ min ≤ ϕ max ≤ 2.5 are user selected parameters. The AC-time-varying strategy is the one proposed in [39], where ϕ 1 decreases from 2.5 to 0.5 and ϕ 2 increases from 0.5 to 2.5. In the AC-extrapolated strategy, proposed in [40], the value of the ACs is a function of the iteration number and particles quality computed as follows: , the step size of the particle will be larger, and when f ( l i t ) f ( x i t ) it will be smaller.

G. Reinitialization Components and Velocity Clamping
The last group of components in PSO-X have been proposed with the goal of avoiding performance issues that affect PSO, such as divergence and stagnation.
The first ones is stagnation detection [41]. It is used to perturb the velocity vector of a particle when its current position is too close to the global best solution, and the velocity magnitude is not large enough to let the particle move to other parts of the search space. That is, when || v i t || + || g t − x i t || ≤ μ, where μ > 0 is a user defined threshold for the perturbation to occur. When the stagnation condition is verified, the velocity vector of the particle is randomly regenerated as follows: The second component, particles reinitialization [22] is used to regenerate the position vector of the particles in case of early stagnation or ineffective movement is occurring. Early stagnation is considered to be affecting the implementation when the standard deviation of the p t vectors is lower than 0.001. In this case, each entry of the particles position vector is randomly reinitialized with probability 1/d. The second criterion, which tries to identify when particles are moving ineffectively, consists in detecting when the overall change of g t is lower than 10 −8 for 10 · d/pop iterations and regenerating particles positions using the following equation: The last one is velocity clamping [10], [42] and consists in restricting the values of each dimension in the velocity vector of a particle within certain limits to prevent overly large steps. This is done using the following equation: where v j max and −v j max are maximum and minimum allowable value for the particle's velocity in dimension j. The value v j max = [(ub j − lb j )/2] is set according to the lower lb j and upper ub j bounds for dimension j on the search space.

V. EXPERIMENTAL PROCEDURE A. Benchmark Problems
We conducted experiments on a set of 50 static benchmark continuous functions belonging to the CEC'05 and CEC'14 "Special Session on Single-Objective Real-Parameter Optimization" [43], [44], and to the Soft Computing (SOCO'10) "Test Suite on Scalability of EAs and Other Metaheuristics for Large-Scale Continuous Optimization Problems" [45]. A detailed description of the benchmark functions can be found in the given references and in the supplementary material [4] of this article.
The test set of continuous functions- Table V

B. Experimental Setup
The computational budget used with irace was of 50 000 executions for creating the PSO-X algorithms and of 15 000 executions for tuning the parameter of the PSO variants included in our comparison. The reason for using different budgets is that there are 58 parameters involved in the creation of the PSO-X algorithms, and only between 5 and 10 parameters in the tuning of the PSO variants. The functions employed for creating and configuring the algorithms with irace (i.e., the training instances) used d = 30, and the ones used for our experimental evaluation used d = 50 and d = 100, depending on the scalability of each function. In order to present statistically meaningful results, we perform 50 independent runs of each algorithm on each function and report the median (MED) result-to measure the quality of the solutions produced by the algorithms-and the median error (MEDerr) with respect to the best solution found by any of the algorithms.
In all cases, the algorithm was stopped after reaching 5000×d objective FEs. Both the tuning and the experiments were carried out on a single-core Intel Xeon E5-2680 running at 2.5 GHz with 12-Mb cache size under Cluster Rocks Linux version 6.0/CentOS 6.3. The PSO-X framework was codified using C++ and compiled with gcc 4.4.6. 3 The version of irace is 3.2.

VI. ANALYSIS OF THE RESULTS
The analysis of the results is divided into two parts. In the first part, we analyze the performance and capabilities of six automatically generated PSO-X algorithms, named PSO-X all , PSO-X hyb , PSO-X mul , PSO-X uni , PSO-X cec , and PSO-X soco . Each of these PSO-X algorithms has been created using a set of training instances composed of different functions. For PSO-X all , we used all the 50 functions (f 1−50 ), whereas for PSO-X uni , we used only the unimodal functions (f 1−12 ), for PSO-X mul only the multimodal ones (f 13−26 ) and for PSO-X hyb , only the hybrid compositions (f 27−50 ). In the case of PSO-X cec and PSO-X soco , we used the entire set of functions of the CEC'05 and SOCO'10 test suites, respectively. Unlike the SOCO'10 test suite, the CEC'05 competition set includes many rotated objective functions and more complex hybrid compositions. The idea of using different training instances is to try to identify the algorithm components that result in higher performance when tackling functions of different classes.
In the second part, we compare the performance of our automatically generated PSO-X algorithms with ten well-known variants of PSO. We used two versions of each PSO variant: one whose parameters were tuned with irace (indicated by "tnd") and the other that uses the default parameter settings proposed by the original authors (indicated by "dft"). The variants included in our comparison are as follows.

2) Fully informed PSO [35]-a traditional PSO variant that
uses the constriction coefficient velocity update rule 4 (CCVUR) and the MoI-fully informed.

4)
Gaussian "bare-bones" PSO [46] (GauPSO)-a variant that uses the DNPP-Gaussian option of the DNPPadditive stochastic as the only mechanism to update particles positions.

5) Hierarchical PSO [38] (HiePSO)-a variant based on
Top-hierarchical that can be implemented using either ω 1 = linear decreasing or ω 1 = linear increasing. 6) Incremental PSO [34] (IncPSO)-a variant of PSO that uses the CCVUR and Pop-incremental with Init-horizontal.  Table IV, we show the parameter configuration of the versions that we used in the comparison. Note that, with the goal of simplifying their description, we have only mentioned the components that are different in these algorithms from those in StaPSO. This means that, unless specified otherwise, we assumed that the following components and parameters setting are used in their implementation: Pop-constant, Topfully connected with MoI-best-of-neighborhood, DNPPrectangular with Mtx-random diagonal and Pert info = Pert rand = none, ω 1 = constant, ω 2 = 1.0, ω 3 = 0, and AC-constant.

A. Comparison of Automatically Generated PSO Algorithms
The algorithm components in the automatically generated PSO-X algorithms are listed as follows, and their configuration is given in Table V. 1) PSO-X all : Pop-incremental with Init-random, Topfully connected with MoI-best-of-neighborhood, DNPP-rectangular with Pert info -Lévy and PMsuccess rate, Mtx-random diagonal, and velocity clamping.

5) PSO-X cec : Pop-constant, Top-Von Neumann with
MoI-best-of-neighborhood, DNPP-rectangular with Pert info -Lévy and PM-success rate, Pert rand -noisy with PM-success rate and Mtx-random diagonal.   ("Av.MED"), the average ranking of the algorithm across all 50 functions ("Av.Ranking"), and whether the overall performance of any of the compared algorithm was significantly worse ("+") or equal ("≈") than the best-ranked algorithm according to a Wilcoxon's rank-sum test at 0.95 confidence interval with Bonferroni's correction. PSO-X all was the algorithm that ranked best of the six followed by PSO-X cec and PSO-X hyb , while PSO-X soco was the one that returned the best median result in the higher number of cases. The symbol "+" next to some of the median values in Table VI indicates the cases where we found a statistical difference function-wise in favor of PSO-X all according also to a Wilcoxon-Bonferroni test with α = 0.05. PSO-X uni , which ranked last of the six, was the algorithm that performed statistically worse than PSO-X all in most functions (26 out of 50 functions), while PSO-X mul , PSO-X cec , PSO-X hyb , and PSO-X soco were worse in 23, 19, 17, and 14 functions, respectively. In the following, we examine the performance of the six PSO-X algorithms across the different function classes in our benchmark set, focusing on those that are specific to a function class, and the effect of their algorithm differences in their performance.

1) Comparison of the PSO-X Algorithms on Specific Function Classes:
In order to know whether our PSO-X algorithms are able to obtain better results in specific function classes, we analyze their performance according to the average ranking (Av.Ranking) they obtained in the unimodal (f 1−12 ), multimodal (f 13−26 ), hybrid composition (f 27−50 ), and rotated (f rotated = f 2−4,6,14,16,18,20,22−26,42−50 ) functions. The Av.Ranking gives us an indication of how good or bad is the performance of an algorithm across the different classes based on the result of the winner of each function. In Table VII, we present this information together with the algorithms average median error (Av.MEDerr). In our analysis, we pay particular attention to the results of PSO-X uni , PSO-X mul , PSO-X hyb , and PSO-X cec , that are the algorithms we would expect to obtain better results because of the functions used for creating them.
As shown in Table VII, according to the median solution quality, the performance of the algorithms is weakly correlated with the class of functions used with irace. Although PSO-X mul ranked first in its function class of specialization, PSO-X uni was outperformed by all the algorithms in the unimodal functions, PSO-X hyb was outperformed by PSO-X all in the hybrid compositions, and PSO-X cec was outperformed by PSO-X hyb and PSO-X mul in the rotated functions. An analysis of the results using the average median error of the algorithms shows similar results, although, in this case, the performance of PSO-X uni and PSO-X mul was weakly correlated to the class of functions used in their training sets.
There are a few possible reasons why the use of different sets of functions did not have a stronger effect on the performance of our algorithms. The first one is the way in which we separated the functions, that captures some features of the functions, but neglects others, such as separability, noise, and different combination of objective functions transformations. 5 Another possible reason is the presence of slightly overfitted models during the creation of these algorithms with irace. The effect of overfitting can be observed more clearly for PSO-X uni than for the rest of the algorithms. For a number of functions (e.g., f 6,12 ) the median solution obtained by PSO-X uni was significantly better than that of the other algorithms, which contributes to lower the value of the Av.MED and Av.MEDerr metrics, but not to improve its ranking in its respective classes of specialization. Among the possible causes for the overfitting are the use of training sets with different number of instances (PSO-X uni has 12 instances, while the best-ranked algorithm, PSO-X all , has 50) and of an exceedingly large computational budget used with irace.
2) PSO-X Algorithm Differences: The first thing to note about the design of the six PSO-X algorithms is that, despite they were created using different sets of functions, they all share the same core components, i.e., DNPP-rectangular with Pert info -Lévy or Pert info -Gaussian. This combination of components, as we discuss in Section IV-B, has the ability of making the implementation rotation invariant, which is an important characteristic given that 22 out of the 50 functions in our benchmark test set have a rotation in their objective function. In all cases, the strategy to control the PM was PM-success rate and, with the exception of PSO-X uni , they all have a parameter setting where f c is larger than s c . This setting allows to decrease rapidly the PM when particles have been constantly improving the global best solution, but makes harder to switch back to a larger PM if the algorithm happens to stagnate. In this sense, PSO-X all , PSO-X hyb , PSO-X mul , PSO-X cec , and PSO-X soco are biased toward exploitation, and PSO-X uni toward exploration.
Although PSO-X hyb and PSO-X cec obtained similar results in most functions and ranked almost the same across the whole benchmark set, the performance of PSO-X cec was better in the CEC'05 hybrid compositions (f 41−50 ), and worse in functions f 27 , f 30 , f 31 , and f 34 that belong to the SOCO'10 test suite. Based on the components and parameter setting in PSO-X hyb and PSO-X cec , this difference can be attributed to the Pert rand component that is present only in PSO-X cec . The Pert rand component was advantageous for PSO-X cec to tackle the more complex search spaces of the CEC'05 hybrid compositions, where the algorithm performed its best, but affected its solutions quality in most of the SOCO'10 test suite hybrid compositions. Another interesting comparison can be done between PSO-X all (ranked first) and PSO-X uni (ranked last). These two algorithms have the exact same components and differ only in the population size, which is roughly three times larger in PSO-X uni compared to PSO-X all ; parameter ω 1 , which is equal to ω 2 in PSO-X uni and random in PSO-X all ; and parameters f c and s c , whose value is inverted in PSO-X uni compare to PSO-X all (see Table IV). Data from Table VI shows that the configuration of PSO-X uni is quite performing to tackle functions with large plateaus and quite regular landscapes, such as Elliptic (f 2 ), Schwefel (f 6 , f 8 , and f 12 ), or Rosenbrock (f 15 and f 16 ), where PSO-X uni was the best performing of the six. However, when PSO-X uni faced less regular and multimodal landscapes, its performance declined significantly. Given the large number of parameters in PSO-X compared to most PSO variants in the literature, framework users could be interested in obtaining information about the sampling distribution of the parameters and the way in which they interact with each other. We present this information in Section 4.2 of the supplementary material [4].

B. Comparison With Other PSO Algorithms
We also compared our PSO-X algorithms with ten traditional and recently proposed PSO variants. As mentioned before, for each algorithm, we collected data using both a default (dft) version-that uses the parameter settings proposed by the authors-and a tuned (tnd) version-whose parameters were configured with irace. Based on a Wilcoxon-Bonferroni test at α = 0.05, we selected the best performing of the two versions of each algorithm. However, since the computed p-values were larger than 0.05 for enhanced rotation-invariant PSO (ERiPSO), Frankenstein's PSO (FraPSO), and SPSO11, we selected the version that obtained the lower median value across the 50 functions. In Table VIII, we show the median of the 50 runs  executed by each algorithm for each function and, in Table IX, we show the mean ranking obtained by each algorithm according to the different classes in which we separated the functions in the benchmark set. To complement the information given in the tables, in Section 6 of the supplementary material [4], we present the distribution of the results obtained by the 16 compared algorithms using box plots.
In terms of the median solution quality, except for PSO-X uni , the performance of the automatically generated PSO-X algorithms was better than any of the PSO variants in our comparison. PSO-X cec obtained the best ranking followed by PSO-X mul and PSO-X hyb , and it was also the algorithm that returned the best median value in most functions. Regarding the performance of the algorithms on specific problems classes, PSO-X cec obtained the best ranking according to the Av.MED result in the unimodal and rotated functions, PSO-X mul the best one in the multimodal functions, and PSO-X all the best one in the hybrid functions; whereas the algorithms that obtained the lower Av.MEDerr were LcRPSO dft in the unimodal and rotated functions, and PSO-X mul in the multimodal and hybrid composition functions. To put these results in context, in Table IX, we have used boxes to highlight the results of the PSO variants whose ranking was equally good, or better, than any of the PSO-X algorithms.
Note that only IncPSO tnd and StaPSO tnd were capable of outperforming the results obtained by some of the PSO-X algorithms, especially in the rotated functions, where those two algorithms were as competitive as those automatically generated. However, in the case of the hybrid composition functions the results are quite compelling in favor of PSO-X , since even the worst automatically generated algorithm performed significantly better than any of the PSO variants. This is a very strong point in favor of our PSO-X algorithms not only because half of the functions in our benchmark set are hybrid compositions, but also because these kind of functions are the hardest to solve and the most representative of real-world optimization problems.
According to Wilcoxon pairwise tests between PSO-X cec and the PSO variants using the data presented in Table VIII, the median solution values obtained by FinPSO tnd , StaPSO tnd , and IncPSO tnd are not statistically different from PSO-X cec . FinPSO tnd was the best performing of the PSO variants in the unimodal functions, and IncPSO tnd in the multimodal and rotated functions. The three PSO variants have some commonalities regarding their design, including that they all use low connected topologies (Von Nemann and ring) during most of their execution (see Table IV) and, in the case of FinPSO tnd and IncPSO tnd , they both use the CCVUR. While our experimental results show that only IncPSO tnd and StaPSO tnd are clearly better than one of our algorithms (PSO-X uni ) across the whole benchmark set, the three PSO variants (FinPSO tnd , StaPSO tnd , and IncPSO tnd ) produced results that are competitive with the PSO-X algorithms in some specific classes of functions. Finally, it is worth pointing out that none of the default versions of the PSO variants that we included in our comparison (i.e., ERiPSO dft , FraPSO dft , and LcRPSO dft ) performed as well as   the ones that were configured with irace in terms of median solution quality. It is particularly interesting the case of StaPSO and FinPSO, whose performance improved dramatically after the configuration process. However, it is extremely common to see these two variants implemented with default parameters in many papers proposing and comparing new algorithms.

C. Are PSO-X Implementations Convergent?
Local convergence is one of the most salient characteristics of high-performing PSO implementations. It prevents unwanted roaming (which often results in particles leaving the search space) and allows particles to improve the initial solutions for any number of dimensions. Creating convergent implementation using PSO-X is possible because 1) the value of the three main parameters of PSO (ω 1 , ϕ 1 , and ϕ 2 ) is limited within the theoretical region where local convergence is expected to occur 6 and 2) PSO-X is implemented both a number of algorithm components that have been shown to result in particles local convergence (e.g., DNPP-spherical, Pert info -Gaussian, Pert info -Lévy, etc.) and strategies that limit the magnitude of the velocity vector (velocity clamping).
As it can be observed in Tables IV and V, the six PSO-X algorithm that PSO-X automatically created use the two main algorithm components proposed for the LcRPSO (i.e., DNPP-rectangular and Pert info -Gaussian). 7 In [20], it was formally and experimentally demonstrated that LcRPSO is locally convergent because the Pert info -Gaussian component satisfies the local convergence condition, which ensures that the mapping between the input vector r and the perturbed vector N ( r, σ t ) is located in any definable region of the search space (see [20,Appendix 1] for the formal definition of this condition). Although our PSO-X algorithms are six different specializations of LcRPSO, by using a number of algorithm components that were found to be good design choices during the configuration process, they exhibit better performance than any of the variants considered in this study. We believe this shows the power of combining automatic configuration and component-based framework to create high-performing algorithms.

VII. CONCLUSION
In this article, we have proposed PSO-X , a flexible, automatically configurable framework that combines algorithm components and automatic configuration tools to create highperforming PSO implementations. Six PSO algorithms were automatically created from the PSO-X framework and compared with ten well-known PSO variants published in the literature. The results obtained after solving a set of 50 benchmark functions with different characteristics and complexity showed that the automatically created PSO-X algorithms exhibited higher performance than their manually created counterparts.
In PSO-X , we have incorporated many relevant ideas proposed in the literature for the PSO algorithm, including different topologies, models of influence, and ways of handling the population; several strategies to set the value of the algorithm parameters; a number of ways to construct and apply random matrices; and various kinds of distributions of particles positions in the search space. With PSO-X , we seek to provide a tool that can simplify the application of PSO to tackle continuous optimization problems, and also to bring clarity on the main design choices available when implementing it. There is, however, one clear limitation in our work: since PSO is an intensively studied algorithm with hundreds of variants, including in PSO-X the totality of the ideas proposed for this algorithm is challenging. Hence, a continuous effort must be done to keep adding new algorithms to PSO-X so that implementations remain competitive with the state of the art.
As future work, we are planning to explore two directions. The first one is to create a version of PSO-X from which hybrid PSO algorithms can be created; we are particularly interested in including components from exact methods (e.g., Nelder-Mead Simplex method [47]) and from evolutionary computation (e.g., evolutionary random grouping [36]), which have been shown 7 Note that, in addition to Pert info -Gaussian, it is possible to use the Pert info -Lévy component, since the Lévy distribution is a generalization of the Gaussian. to be highly competitive and even the state of the art for many problems. The second direction consists of extending PSO-X with components from recent stochastic optimization algorithms, in particular those that are controversial (see [48], and the references in this article), in order to see if we can highlight similarities between those algorithms and what have been proposed in the context of PSO.