Dynamic Interconnection Approach With BLX-Based Search Applied to Multi-Swarm Optimizer: An Empirical Analysis to Real Constrained Optimization

The Multi-Swarm approach allows the use of multiple configurations between two or more populations of particles, where each one can present different approaches (e.g. lbest, gbest, Unified, Guaranteed-Convergence) directed towards improving the optimization process. This article presents a proposal for local/global stochastic interconnection applied to the context of the Multi-Swarm algorithm, as well as for incrementing a local search method for refining previously obtained solutions. Two proposals are introduced for this new Multi-Swarm PSO (MSO). The first one is the inclusion of "counterpart particles", which establishes a sub-topology between inter-swarm particles, accessed by migration and evaluability rules. The other involves using customized crossover operators and is based on the BLX scheme (Blend Crossover) with direction information used as a reference for establish a subspace search around the particles. Performance and robustness of the new approaches were assessed by ten constrained engineering design optimization problems (COPs), as is compared to other solutions already published in the scientific literature. Results indicate significant performance improvements for all 10 COPs when compared to concurrent-based MSOs. By making available new references from other swarms, the counterpart particles approach tends to improve the optimization process in the search space, while an intermediate layer of local search based on a modified directed BLX crossover should provide an extra search around the particle, and thus, refining previously obtained solutions.

search space (geographical and spartial location), while the food sources refer to the current local best position (particle) or global optima (swarm) location for the problem [1], [8].
Six important characteristics make PSO a quite attractive metaheuristic in the search and optimization context [9], [10]: 1) Intrinsic memory associated with each solution (local); 2) Capacity for information exchanges; 3) Search based on references of good prior results; 4) High degree of parallelism; 5) Easily implemented; 6) Rapid convergence. Because of those attributes, PSO has been succesfully applied to a diverse family of problems, with hybrid or derived versions being used in several fields of scientific knowledge (e.g. engineering, antenna design, robotics, machine learning) [11]. Such variations range from modifying control parameters and topologies, to introducing new learning techniques [2], [11]- [14].
Likewise, the use of multi-swarm structures is also noteworthy. They present algorithms with high diversification, flexibility and strategies directed towards the control and flow of information between different populations [9], [15]- [17]. Several proposals involving the use of cooperative PSOs organized into sub-swarms have been published in the scientific literature, which indicate superior performance compared to mono-swarm implementations [17]. Examples of multi-swarm PSOs (MSOs) include versions with topology configurations between populations and data transfers related to global bests results, as well as strategies involving communication hierarchies between swarms (e.g., masterslave) [9], [17]- [19].
Parallel to the multi-swarm methods, several hybrid versions of the PSO have been developed. Optimization strategies that include two or more search mechanisms working together with the original mechanism tend to increase efficiency in use of the search space, as well as providing additional resources for controlling diversity and local searches [20]. Considering the possibilities for improvements and customization capacity of Multi-swarm PSOs, as well as addition of techniques focused on sweeping the search space beyond the mechanisms found in the original PSO original, hybridizing particle swarms with crossover-based solutions for local search in subspaces presents a range of possibilities for allowing improvements in the optimization process.

A. PROBLEM DISCUSSION
Because it is a tool originally designed to have a strong tendency for refining previously obtained results, the original PSO [1] has some limitations in balancing between exploration 1 and exploitation 2 , because due to the rapid convergence, there is a risk of stagnation at the local bests [9], [22]- [24]. In contrast to that scenario, and with a history of five decades of development, GAs have emerged as one of the more successful evolutionary algorithms, mainly because 1 Exploration: Diversification of solutions in the search space [21] 2 Exploitation: Search refinement in promising regions [21] of their capacity for adaptation regarding mechanisms for diversification [25]- [27].
Given the number of operators and adjustable parameters, as well as a great variety of crossover techniques, such algorithms present a better balance between exploration and exploration. More specifically, the use of crossover methods between solutions and mutation are capable of generating new combinations with greater balancing between exploration of the search space and refinement of the best results already found [9], [28]- [30]. The combination of such mechanisms in a multi-swarm environment with strongly connected topology, as well as the exchange of data between the best sites and globally between all swarms through context and rules for migration should provide a sophisticated optimization market, with rapid convergence and greater use of space for solutions through the socalled "variant particles". A basic scheme for the proposed algorithms can be seen in Figure 1. Considering the mechanisms presented in Figure 1, this paper expands the research with hybrid multi-swarms begun in [31], [32], introducing a set of migration rules adapted for rapid access to memory data belonging to neighboring swarms (i.e., counterpart particles, better global value among all swarms), besides a control scheme based on stochastic rules for allowing access to swarm components (local/global stochastic access factor). At the particle level, each solution is equipped with mechanisms for generating variations based on its own values within a defined search subspace, established using the information on the direction (or trend) of the respective position data. New particles are then generated based on the BLX method modified inside that search space, providing a greater sweep capacity in an N-dimensional function.
This article is divided into the following sections: Related works with the state of the art (Section 2); Theoretical Basis for the Classical Particle Swarms, Multi-Swarm Optimization and presenting the mechanisms for the proposed FC-CPMSO 3 algorithm besides its version with "variant particles," generated by a modified crossover BLX: FC-CPMSO-V1/V2 4 (Section 3); Experiment setup, with description of the benchmark functions, as well as the thread organization in CUDA and the results obtained from the tests (Section 4); Final remarks and conclusion, with commentaries regarding the advantages and drawbacks related to the proposed algorithms (Section 5).

II. A BRIEF RETROSPECTIVE AND RELATED WORKS
This paper may be considered an indirect outcome of the studies published in [31], [32], which developed a set of seven algorithms based on the classical and quantum particle swarm paradigms, both modified to include the Evolutionary Strategies scheme derived from the EPSO 5 algorithm of Vladimiro Miranda and Nuno Fonseca [33]. Of the seven solutions proposed, five are of the multi-swarm type with master-slave topology on the model in Niu [18] and two of the monoswarm type [31], with all being implemented based on the heterogeneous GPGPU computing paradigm 6 powered by CUDA 7 architecture.
Despite the good results achieved by including EPSO's ES 8 model in the context of a multi-swarm version of PSO [32] and QPSO [31], the experiments were conducted using a low number of restrictive functions, with no comparative analysis between the master and slave swarms, thus limiting the assertment of more precise conclusions.

A. RELATED WORKS ON HYBRID PSO WITH GAS
The use of mechanisms derived from Evolutionary Strategies or Genetic Algorithms in customized versions of PSO is a well-established approach with several papers available in the scientific literature, such as the previously highlighted EPSO [33] by Miranda and Leite. Besides hybridization between PSO 9 and GA 10 , other variants of the particle swarm optimization algorithm with refinement of the search process based on other techniques (e.g. mathematical and numerical optimization, local search) have been applied in a wide variety of research fields [36]- [39].
Duong et al. [22] present a hybrid algorithm based on a particle swarm optimization and real-coded genetic algorithms, in which PSO is used to update the solution, while the GA recombination operators produced offsprings by crossover methods applied to a pair of particles. 3  In Ali et al. [40], two combinations between PSO and GA are applied in high-dimensional global optimization, as well as the development of electromagnetic structures and antenna modelling. In Grimaccia et al. [20], the population is divided at each iteration and then optimized based on both techniques (PSO and GA) simultaneously, one for each portion of the population, while in Robinson et al. [41] and Ali et al. [40], PSO and GA are performed sequentially.
Such implementations generally vary between (or include) the following options 1) Split the population between both algorithms [40], [41]; 2) Scaling PSO for exploitation and GA for exploration [22] 3) Modified equations for velocity and position of the particle swarm, which includes mechanisms for mutation and/or crossover (position and velocity) applied to a pair of particles from to the same swarm [42], [43].

B. RELATED WORKS ON SINGLE OR MULTI PSO ON CONSTRAINED OPTIMIZATION (COP)
Several articles regarding the use of PSO for constrained optimization (COP) are briefly reviewed in this subsection, and the results show a good performance in solving problems associated with that class of problems [16], [44]- [46]. The implementations described cover a range of strategies for controlling and managing functions with constraints, including fitness normalization through basic [47], [48] and dynamic [49] penalty factors, feasibility rules [50], [51], optimization of the objective function and constraints separately [52], and others.
In Liu et al. [53], an adaptive approach for restrictive problems is implemented in order to maintain a fixed proportion of non-viable solutions, and is employed in cases where there are viable particles with fitness values higher than all the solutions with violations. Coath and Halgamuge [54] presented a comparison of two methods for manipulating restrictions in the context of the PSO algorithm (i.e. preservation of feasible solutions [55], and application of a dynamic penalty function [56]). It was demonstrated that the convergence rate observed in the PSO based on normalization by penalty factor tends to be faster when compared methods that use only feasble solutions in the search process [45], [54].
With regard to Multi-swarm environments, Zheng et al. [57] propose an agregation of neighboring particles in several sub-swarms in order to explore a subspace of feasible solutions (without violations). In Yang et al. [44] proposed a master-slave PSO hierarchy in which the master swarm is responsible for optimizing the objective function while the slave swarm works by searching for viability in the restrictions. Zhao and Li [58] introduces a two-stage optimizer: Creation of sub-swarms so as to increase the chances of achieving better results (1); Merging the subswarms into a large swarm to refine the global best (2). VOLUME 4, 2016 Despite the improvements proposed by the solutions described above, such algorithms have limitations. The inclusion of a GA applied to the PSO in a co-optimization context tends to increase the algorithm's complexity, while the use of specific mechanisms found in the GA (modified crossover) and adapted to the PSO should provide an efficient, yet less complex, hybrid metaheuristic.
In the context of multi-swarm algorithms, the use of a master-slave topology entails the sequential execution of more than one layer of swarms, which causes data dependency. A concurrent approach using only a single set of interconnected swarms, where an environment based on information exchange ruled by a specific set of rules (synchronous execution of each population), would decrease data dependency as well as solution complexity when compared with master-slave topology.
By adressing these issues, the present paper introduces the following mechanisms and improvements: 1) Intercommunication between better locations for corresponding particles through counterpart particles (CP): We propose the inclusion of components associated to the best local value of all solutions connected in a trans-swarm neighborhood, with such equivalent particles identified by the same index. The "counterpart particles" provide additional information from their neighbors, located in other swarms. 2) A set of migration rules with tolerance threshold, adapted to constrained optimization (TMR): The values related to the local and global bests used by the particles are selected through comparisson between the counterparts' local best and the overall global result found by the swarm set, respectively. Seeking to avoid prevalence of exclusive access for the best components, a tolerance threshold is applied to the migration rules in order to increase the odds for a "more than two components scenario" in the velocity update. This approach is designed to include up to four components for adjusting particle trajectory; 3) Probabilistic Access Factor (PAF): Based on the stochastic star topology proposal from Miranda, Keko and Duque [59] the probabilistic access factor is extended to all components: Particle's local best, best counterpart's local best, swarm's global best and best global value achieved by the multi-swarm environment. This scheme provides an option for alternating between a more explicit (selfish) local search and a more exploitive approach applied to the global best (selfless), with both using probability thresholds; 4) Variant particles generated in search subspaces by a direction-based crossover method (V1/V2): In order to expand exploration capabilities, each particle present in the swarm may generate up to two new alternative solutions. The variant particles are obtained through a modified crossover scheme based on the direction history (trajectory) of the original particle, where a set of positions inside the search subspaces is randomly selected as a reference for devising new solutions; 5) Dynamic limits applied to the inertia weight and acceleration coefficients: Based on the studies conducted by Harrison et al. [60], the inertia weight (w) is calculated through modified stochastic functions, and varies according to the iteration interval. For the acceleration coefficients (C l , C c , C g , C s ), also obtained through a customized stochastic function, the intervals are determined by combining the available (see TMR and PAF), which must be in conformity with the convergence parameters established by Poli's theoretically defined region. It is important to note that, besides the proposals mentioned above, based on data obtained through experiments, this article seeks to analyze the impact and viability of such modifications in the PSO, both in the isolated use of variant particles (4) and the probabilistic access factor (3) when implementing such approaches in the multi-swarm context presented herein (1 and 2).

III. THEORETICAL BACKGROUND
So as to leave the PSO algorithm adapted for better use of the search space, the swarms of the proposed methods, besides the previously defined data on trajectory (i.e., position, velocity, fitness, normalized fitness), provide information associated with a direction trend based on the particle movement history, applied both for demarcating the search subspace around the original particle and to the modified crossover process for generating variant particles. Should the solutions produced by this directed-based crossover prove superior, they will replace the original one and be passed on to the next iteration. In a matrix programming notation, Figure 2 presents the main variables for an individual in the context of PSO algorithms with elements related to the variant particles (subs-swarm limits) and information for guidance. Variables i, j, k and r represent the following indicators: Particle (i); Position or velocity (j); Swarm (k); Restrictions (r).
Through this new set of data and mechanisms, it is possible to expand the PSO particle search capacity beyond the classic velocity update, by adding procedures where all the information obtained by all the swarms can be available simultaneously during the optimization process (see Figure  3). One should note that the process for information exchanges between swarms also benefits from the generation of variant particles (V1/V2), which improves the search by generate new solutions in the subspace area around the original particles.
Every component associated to a particle (presented in Figure 2) is organized through the following data structures (one-dimensional, two-dimensional or three-dimensional): The memory components (presented in Figure 3) are described as follow: • Local best for the i-th particle and counterpart particle (c): 3D and 2D matrix for position (P l, P c); 2D and 1D matrix for fitness (fP l, fP c); 2D and 1D matrix for normalized fitness (nf P l, nf P c); 2D and 1D matrix for number of violated constraints (vf P l, vf P c); 3D and 2D matrix for constraint values (rf P l, rf P c); • Global best for the k-th swarm (g) and multi-swarm (s: 2D and 1D matrix for position (P g, P s); 1D and 1D matrix for fitness (fP g, fP s); 1D and 1D matrix for normalized fitness (nf P g, nf P s); 2D and 1D matrix for number of violated constraints (vf P g, vf P s); 2D and 1D matrix for constraint values (rf P g, rf P s); Based on the aforementioned structures, the algorithms developed in this paper presents a intercommunication scheme of particles from different swarms during velocity update, which is organized into customized migration and access rules, thus allowing not just a new list of references for the particles, but a dynamic variation between different combinations of components applied to the velocity update equation. More details are presented in Subsection III-C1. The canonical version of the algorithm developed by Kennedy and Eberhart [1], [61] is a search method organized into clusters in a swarm structure [1], [58], [62]. It initializes (randomly or through preprocessed data) the values for position (X [i] [j] ) and velocity (V [i] [j] ) in the D-dimensional of each i-th particle, where D 1 is the maximum limit allowed for the dimension index j, P is the number of particles in a single swarm and T is the number of iterations. Later on, all solutions (particles) is submitted to fitness evaluation (f [i] ). The optimization process consists of adjusting the trajectory based on two references: Best individual position or local best (P l [i] [j] ) and best position obtained by all the particles in the swarm or global best (P g [j] ). At each iteration, the particles are updated by means of Equations 1 and 2.
Variables found in Equations 1 and 2 are defined as follow: Iteration index (t); Inertia weight (w); individual and social acceleration coefficients (C l , C g ); uniform random number generators distributed in the range between 0 and 1 (R 1 = U (0, 1) e R 2 = U (0, 1)). By observing the composition of the velocity update in Equation 1, Figure 4 exemplifies the use of components in the particle movement, highlighting the individual memory (local best (P l [i][j] (t))), collective memory (global best (P g [j] (t))), and inertia applied to the velocity component. Inspired by the original PSO implementation, new mechanisms have been included to improve the algorithm, notably in controlling velocity and the methods for boundary conditions, which seek to correct values for position and velocity outside of the established intervals [63]. Equations 4, 7 and 8 present the boundary conditions for position (Damping) and velocity (Clamping).
• Modified velocity clamping ( = 0.5). Includes a condition and procedures for treating a particle in stationary state, where V [i][j] = 0: • Boundary conditions (Damping). Repositioning particles in the search space.
Considering those modifications described above, Algorithm 1 demonstrates the classic PSO with variable inertia weight (e.g., linear decrease) first published by Shi and Eberhart [8].
; Obtain the i-th particle's local best (P l, fP l, vP l); Obtain the swarm's global best (P g, fP g, vP g); ); Obtain the i-th particle's local best (P l, fP l, vP l); Obtain the swarm's global best (P g, fP g, vP g);

B. MULTI-SWARM OPTIMIZATION
Muti-swarm models based on cooperative or competitive PSO schemes have been introduced over the years to more efficiently explore the search space [17]. These algorithms use swarms executing in parallel while sharing information or evolving concurrently. Based on the taxonomy developed by El-Abd and Kammel [17], it is possible to establish categories for classification of cooperative PSOs, as well as an extrapolation to competitive models where there is data exchange between swarms.
The application of cooperation and competition structures in PSOs tends to follow the same rationale when compared to multi-population solutions implemented in other search algorithms (e.g. Island GAs), with main differences in the communication methods between the groups of solutions. Cooperative search algorithms were extensively studied in the first decade of the 21st century to effectively solve many large optimization problems, where the basic approach involves the use of one or more search clusters (e.g. swarms, islands) exchanging information in order to explore the seach space more efficiently and, consequently, obtain better solutions [5], [17].
In the context of the algorithms FC-CPMSO, FC-CPMSO V1 and FC-CPMSO-V1/V2, four swarm organization models were used as the basis for the proposed topological structure. Each of them has its own information exchange schemes, as well as different search space definition rules. Regarding the main changes of these type of algorithms when compared to the classic version of the PSO [1], [8], the inclusion of new components to the velocity update calculation stands out, by sharing the global best related information through all swarms.
All four approaches described beloy is classified by [17] as algorithms for use in static optimization 11 , and can be seen in Figure  It starts with the execution of CPSO_H, followed by a standard PSO, which receives the solution vector resulted from the previous operation; 2) Multi-Population Cooperative PSO (MCPSO) [18]: Swarms organized into two cluster layers, where the slave exams do not communicate with each other, by the same time they send their global best values to the master swarm, which will select the fittest among them; 3) Concurrent PSO (CONPSO) [64]: An algorithm based on simultaneous optimization, where swarms, executing different types of PSO, shares the global best among them, which increases the search capacity; In comparisson with the multi-swarm aproaches previously described, the FC-CPMSO algorithm uses a concurrent structure to transfer data between the swarms, including global bests and counterpart particles, whereas in the FC-CPMSO-V1/V2 versions 12 , search subspaces are established based on the particle's trajectory and direction. 11 Class of optimization problems with only one global optimum 12 See Subsection III-D VOLUME 4, 2016 For the purpose of improving performance in the multiswarm PSO algorithm search, two additional components are included in the particle movement process: 1) Local best values for the set of interconnected particles (interswarm neighborhood), or counterpart particles (P c [i][j] (t)); 2) Global best result achieved by all the swarms (P s [j] (t)). The availability of that information during the search process seeks to expand capacity for optimization by accelerating the convergence process, with external references influencing velocity update.
Unlike the master-slave approach proposed by [18] and adapted to other metaheuristics in later works [31], the FC-CPMSO algorithms operates in a competitive multi-swarm environment. In that scenario, the extra components are associated with the information shared between particles from different populations operating competitively, thus excluding the use of master-swarms. Figure 6 illustrates four FC-CMPSO swarms, where each particle is connected with its counterparts located in other swarms. According to the scheme exposed in Figure 6, all particles located in different swarms with equal label (numbers) share information with each other (i.e., position, velocity, fitness). They compose a sub-topology structure that connect particles from different swarms, providing aditional components and references from other clusters. With a view to customization, the proposed algorithms are implemented so as to allow different topologies (e.g., rings, star), at both the swarm and counterpart levels. Figure 7 illustrates the topological structure present in the proposed algorithms. Each particle is organized in two topologies: Fully-connected between A) all members of its respective swarm (solid black lines) and B) all the counterparts located in other swarms (dotted magenta lines). In both cases, the information is accessed through the members of a topology which is, by design, complete 13 . Based on the proposed structure (see 6 and 7), each particle has access to the best results obtained by the other swarms, directly impacting the convergence and capacity for exploring the solution space. The next subsections will explore in details the equations used in FC-CPMSO algorithm for trajectory calculation, which also includes new access rules for each component and customized stochastic functions for inertia weight and acceleration coefficients.

1) The Main Proposed Approach (FC-CPMSO)
The main idea of the FC-CPMSO algorithm consists of adding the aforementioned inter-swarm components (i.e. counterpart particles, global best for all swarms) in an access scheme based on 1) A comparison of Fitness values between components (migration rules, or TMR) and 2) Information (or component) availability during execution of the i-th particle's velocity update (probabilistic access, or PAF). In order to establish an easy-to-implement solution, all modifications in FC-CPMSO algorithm are based on conditional rules, which will define both main parameters (i.e. inertia weight, acceleration coefficients) and components (TMR, PAF).
Regarding the use of migration with tolerance threshold, Equations 9 a 14 establish the rules for TMR. Figure 8 shows every possible combination of components using TMR and PAF inside the search space.
• Particle's local best (fP l • k-th swarm's global best (fP g [k] , vf P g [k] ) and all swarm's global best (fP Through the migration rules described in Equations 9 to 14, FC-CPMSO algorithm provides a new scheme for velocity update with dynamic behavior regarding access to information from other swarms. The component selection criteria uses the fitness values and the number of violations of each solution in order to select which one will be used in the optimization process. One should note that the evaluation methods applied by TMR rules (represented by the variables ✓ l and ✓ g ) are are adapted for constrained functions, with the number of violations as the primary parameter, followed by the fitness values from the memory components. With TMR, every particle can use up to four components at the same time for the velocity calculation with a competitve or semi-competitive approach 14 . In cases where three or more memory references are used, the values of associated components (i.e. local best and counterparts local best, global best and all swarms' global best) are reduced by half, and that condition is reached when the difference (or tolerance) between the fitness values of associated components ( l or g ) is less than 2%. A second mechanism found in FC-CPMSO is the probabilistic access factors for each component (PAF). Based on the stochastic star topology published by [59], the PAF method defines the probability which a given memory component is available for access in the context of the velocity update calculation. It works by including a binary variable (0 or 1) for each memory components associated with the local and global access factor (i.e. ⇢ l , ⇢ c , ⇢ g , ⇢ s ), expanding the number of possible combinations between them (see 8).
Based on the mechanisms described above, Equations 15 to 19 shows the FC-CPMSO's new scheme for velocity update, which includes the migration rules TMR (' l , ' g ) and the probabilistic access factor PAF (⇢ l , ⇢ c , ⇢ g , ⇢ s ), both working in a single velocity equation (19).
• Counterpart local best component (cL c ): • Global best component (cG g ): • All swarms' global best component (cG s ): • FC-CMPSO's new velocity update equation: Variables for Equations 15 to 19 are: Inertia weight (w); acceleration coefficients for local, counterparts, global and all swarms components (C l , C c , C g , C s ); references for local and global migration rules (⇢ l , ⇢ c , ⇢ g , ⇢ s ); boolean flags for probabilistic access factors for local, counterpart, global and all swarms' components (R 1 , R 2 , R 3 , R 4 ); Uniform random values between 0 and 1 (R . For the inertia weights (w) and acceleration coefficients (C l , C c , C g , C s ), a custom stochastic variation model in a closed interval were selected to be used in combination with FC-CPMSO. Setting the minimum and maximum limits for those variables based on differernt rules, as well as the procedures (functions) for obtaining the aforementined factors are important for establish the relevance of each component during the optimization process [65]. 14 The use of 50% of the total value of a given component is called semi-cooperative. That classification is based on the cooperative approach developed by [18], where extra components are included to the velocity update Equations 20 to 24 presents the calculation for stochastic inertia weight (w) applied to velocity update 15   . These conditional rules defines the probability of w occurring outside the standard limit. In the context of this work, the selected threshold value was 0.7.
It's important to emphasize that such limits has its parameters adjusted to smaller values after the execution of 1/3 of the iterations (see Equation 20), with the inertia weight been generated stochastically within limits that tends to decrease. Such variation allows a more adequate velocity control considering the algorithm's convergence behaviour, thus avoiding a deterioration of the algorithm's exploration or exploitation capacity in the occurrence of successive values of w close to the minimum and maximum limits, respectively.
To illustrate the proposed stochastic inertia weight procedure, Figure 9 shows the variation of the limits in each iteration interval, as well as an example of w values obtained by a uniform random number generator. The areas shaded in red, blue and purple correspond to the standard (solid lines) and extended (dotted lines) intervals of iterations (t). After obtaining the inertia weight (w), the next procedure involves setting up the acceleration coefficients (i.e. C l , C c , C g , C s ), which are obtained through stochastic functions intrinsically dependent on the control factors TMR and PAF. Equations 25 to 28 introduce the calculations of the probabilistic access factor (PAF) for the following components: Local (⇢ l ), counterpart (⇢ c ), global (⇢ g ) and all swarms (⇢ s ).
Based on the experiments conducted by Miranda, Keko and Duque [59], communication probability between 0.1 and 0.4 led to better performance than a fixed communication topology equal to 1. In this work, the limit value that defines the presence or absence of components in the velocity update equation is 0.3 (see Equations 25 to 28).
Another important variable associated with PAF is the number of available components by the time when the velocity update is executed. For the FC-CPMSO algorithm, the value of N C verifies how many acceleration coefficients (C l , C c , C g , C s ) will be used in the particle's movement. Equations 29 to 32 establishes the presence (1) or absence (0) of each isolated component (S l , S c , S g , S s ), while Equation 33 calculates the number of accessible components (N C ).
The results from the ceiling 16 functions in the Equations 29 to 32 ensures that only two results can be obtained from those math operations: 0 or 1. In the case where a TMR variable (i.e. ' l , ' g ) has its value equals to 0.5, the present ceiling function will ensure the rounding of the product to the next integer value: 1.  [66] for convergence analysis. The parameters inside the parabolic region present convergent behavior [60].
After checking the component's availability (Equations 29 to 32), the acceleration coefficients are calculated using a custom stochastic function (expressed by the Equations 34 to 37), which is based on uniform random number generators (U (0, 1)). In order to establish a suitable parameter combination to the convergence criteria observed by the Poli's theoretical analysis [60], [66], [67] (see Figure 10), the proposed calculations for the acceleration coefficients (C l , C c , C g , C s ) are designed to minimize the occurrence of values associated with the inertia weight (w) that can generate a divergent behavior in the optimization process.
Considering the limits for the inertia weight (i.e. w min = 0, w max = 1), the sum of values related to acceleration coefficients must not be greater than 2 (see Figure 10). These specifications were adopted in order to reduce the occurrence of divergent combinations (red hatched area). In the context of the FC-CPMSO algorithm, Poli's theoretical analysis was adapted to deal with four acceleration coefficients instead of two. As a direct result of these modifications, the values of C l , C c , C g , C s tend to oscillate between dynamic sub-intervals within the fixed limits of 0 to 1, which are defined according to the number of available components at the time of velocity update calculation.
Therefore, the lower bounds (C l min , C c min , C g min , C g min ) and upper bounds (C l max , C c max , C g max , C g max ) applied to the acceleration coefficients are defined by the following Equations: The main reason why the stochastic approach was selected over other methods (e.g. linear, nonlinear, selfadaptive, chaotic) is found in the empirical analysis of the inertia weight (w) conducted by Harrison, Engelbrecht and Ombuki-Berman [60]. In this study, eighteen inertia weight (w) strategies were evaluated, with the local and global acceleration coefficients (C l , C g ) assuming a constant value (C l = C g = 1.496180). The results indicates better performance for stochastic (non-static) and constant (static) methods in the context of the inertia factor [60].
For the acceleration coefficients C l , C c , C g and C s , the stochastic variations tends to alternate the components's relevance every time the velocity values are updated. In a scenario with lower values of acceleration coefficient (variation between 0 and 1), a smoother particle movement is expected, and consequently, a lower convergence rate when compared to optimization schemes with high values of C l , C c , C g and C s . These effects tends to attenuate with combined use of multi-swarms approach with counterpart particles. It is noteworthy that the movement of the particle over short distances allows a more detailed exploration of the search space.
As previously mentioned, the limits are determined in accordance with the theoretical analysis of Poly and applied to the acceleration coefficients (0.0  (C l +C c +C g +C s )  2.0) and inertia weight (0.0  w  1), which provides a combination of values that are more likely to converge (see Figure 10).
Based on the forementioned improvements, the particles submitted to the optimization process in the FC-CPMSO algorithm tend not only to perform the search based on variable control parameters at each iteration, but also the structure of the velocity update calculation can vary, according to the availability of the components for access a particles's j-th variable (velocity). These features allows that a new combination of solutions can be generated by multiple approaches in a single particle.
Pseudocode 2 presents the FC-CPMSO algorithm. The red and blue lines highlight the changes that differentiate the proposed algorithm from the classic PSO.
Obtain the i-th particle's local best (P l, fP l, vP l); if k == (K 1) then Obtain the counterparts' local best (P c, fP c, vP c); Obtain the k-th swarm's global best (P g, fP g, vP g); Obtain the all swarm's global best (P s, fP s, vP s); for t 1 to (T 1) do for k 0 to (K 1) do for i 0 to (P 1) do Tolerance-based Migration Rules TMR (' l , 'g): Local migration TMR (' l : Eqs. 9 to 11); Global migration TMR ('g: Eqs. 12 to 14);  . Limits ( min and max) for BLX crossover (V1/V2), based on particle's direction and its displacement range.

D. A DIRECTION-BASED CUSTOMIZED BLX SCHEME FOR LOCAL SEARCH (FC-CPMSO-V1/V2)
Seeking to minimize the incidence of stagnation of particles in local optimas, as well as to expand the search capacity in terms of exploration and exploitation, it is proposed the use of variant particles (VP), generated through a directionbased modified version of the BLX crossover. In this model 17  The limits for V1 and V2 are calculated using Equations 48 and 49, where a 1 , a 2 , b 1 and b 2 define the ranges for the uniform random number generator (U (a 1 , a 2 ), U (b 1 , b 2 )), which establish the actual location for the upper and lower bounds for the search subspce, When analyzing Equations 48 e 49, it is found that the main difference between V1 and V2 lies in a 1 , a 2 , b 1 and b 2 . A combination of different values for those variables implies in different search beahviour. For the algorithms implemented in the context of this article, the configurations below (defined by the user) illustrate the scheme found in Figure 11, which shows a procedure for exploitation (V2) and exploration (V1). After the calculation of search subspace limits A and B, the new position values associtaed to the variant particles (V1 and/or V2) are obtained through the Equation 54.
In evolutionary algorithms based on BLX-↵ crossover recombination, two individuals generate offspring based on their alleles, which also provides the search limits. In the case of FC-CPMSO-V1/V2 algorithms, the differences between the previous (X ), will define the lower and upper boundaries. In short, the position values of the "parents" used to generate V1 and V2 are selected within the range established by Equation 54.
The fitness 18 associated to the variant particles (VPs) are compared with those of their respective original versions. If V1 or V2 obtains better results than the original particle, this one will be discharged and replaced by the fittest variant. This new particle will then have its velocity values recalculated based on Equation 55, having its trajectory modified in relation to its previous location ( .
The use of local search based on directed BLX crossover, allows a refined search (exploitation) in a subspace closer to the current particle's position (Second Variant (V2)). Concurrently, the information associated to particle's displacement ( ) is used to establish a search region in the gradient direction (First Variant (V1)). In both cases, there's a small probability in which V1 and V2 ( Figure 11) get values opposite the specified trend, which allows for a wider range of possible combinations.
By observing Algorithm 2 (in blue), the use of local search methods V1/V2 in the context of the PSO algorithm includes a direction-based crossover stage with up to three solutions per particle (original, V1, V2), followed by fitness comparison procedures, thus providing a local search engine with low impact on computational complexity.

IV. IMPLEMENTATION, EXPERIMENTS AND RESULTS
To perform benchmark tests and assess the impact of the proposed mechanisms, we evaluate the proposed algorithms on ten engineering constrained functions that are widely found in the scientific literature ( [68]- [74]).
For optimization problems, a common attribute present in the engineering field is the [75] constraints. They are associated with the limits that the value of a variable can have or the relationship between them. Thus, a global optimization problem with constraints is formalized based on the Equations 56 [76]: 18 In the case of constrained optimization problems, the evaluation is made based on the following priority order: 1) Number of constraint violations; 2) fitness value (minimization or maximization).
Where: f (x) : R n ! R, g(x) : R n ! R m , h(x) : R n ! R p . Another important stage in particle's evaluation is the fitness normalizaton (nf [i] [k] ). This process is done according to Deb's Rule [54], where violated solutions are converted into a single value, obtained through a penalty function. Equations 57 to 61 show the normalization process, where E represents the set of solutions without violations (feasible), and fP y [k] is the worst viable 19 real fitness global value. All six algorithms previously described were implemented using parallel computing in CUDA 20 architecture, where each particle is executed by a Thread, while each swarm is organized as a Blocks of threads, which make up a set of swarms, represented by a Grid structure. Figure  12 illustrates a structure applied to a PSO algorithm into a single-dimensonal CUDA grid.

FIGURE 12.
Organization of the execution flow using the CUDA architecture (threads and blocks) applied to the PSO algorithms implemented in this paper. The distribution of data for each line of execution is given by a 1 to 1 relation: One particle, One thread; One swarm, One block of threads; One multi-swarm environment, One grid of blocks.
In this parallel model used by FC-CPMSO and FC-CPMSO-V1/V2 algorithms, each particle is executed concurrently in a synchronous approach, using the data from other swarms that were obtained in the previous iteration (t). In terms of the data organization in threads, each instance has access to exclusive indexes in parallel, so as to avoid race conditions, overwriting and data loss. Figure  13 illustrates the organization of vectors for each thread. Observing the structure of the PSO algorithms, it is possible to apply the complete parallelization model on GPUs described by Tan [5], thus, the swarm initialization and optimization occurs without using the CPU for tasks directly associated with the PSO. A second grid with only one block is subsequently executed to obtain the local counterpart particles and the best result achieved by the multiswarm approach. 20 Compute Unified, Device Architecture The parameters used in the tests were selected for the purposes of boosting performance for all six algorithms. Those configurations may be verified in Table 1 2122 . Tests were conducted in a machine with the following hardware configuration: Apple MacBook Pro mid-2012 with NVIDIA GeForce GT 650M GPU 1024 MB GDDR5 VRAM with 900 MHz clock and 384 CUDA Colors; Intel Core i7 "Ivy Bridge" 2.9 GHz CPU; 16 GB DDR3L RAM clocked at 1600 MHz. The operating system used was macOS Sierra 10.12.6, with CUDA driver version 9.1.
Regarding the ten constrained optimization problems (COPs) selected for evaluation and comparative analysis, each one of them is described as follows (see Figure 14): Minimization of the weight of a tension/compression spring. The problem features four constraints describing the minimal deflection, shearing tension, surge frequency and external diameter limits [85], [86]. More details about the ten COPs used for benchmark, including their mathematical functions, can be found in the authors' previous work: Lisboa et al. [29]. 21 Regarding the minimum and maximum limits para for the acceleration coefficients (i.e. (C l , Cc, Cg, Cs)), Equations 38 to 45 describe achievement of those limits. 22 Interval variables applied in the context of modified BLX crossover for local search in V1 and V2 (i.e. (a 1 , a 2 , b 1 , b 2 )) are defined arbitrarily by the user. The values used in the tests are found in Equations 50 to 53   Ranking of the best observed results for mean and best fitness statistic for the proposed mechanisms (i.e. counterpart particles, migration rules with degree of restrictive tolerance, probabilistic access factor, generation of variant particles by crossover BLX based on direction, tolerance thresholds for inertia weight and acceleration coefficient limits) is displayed in Table 4 (mean rank and best results). In it, we can observe that the combination of all resources (FC-CPMSO-V1/V2) provided the best results in nine functions on mean (i.e., DPV1, MWTCS, WBD1, WBD2, HNO1, HNO2, SRD11), and the best fitness values in all constrained problems. In the case of DPV2, the FC-CPMSO-V1 algorithm has archived the best results in terms of mean and median.
Regarding the algorithms that did not use counterpart particles to exchange information between swarms (i. Finally, it is important to emphasize that, with the exception of WBD2, HNO1 and HNO2, the proposed algorithms outperform the methods published in the scientific literature and used for comparison 23 . That statement is made based on the best results found during the experiments. 23 In Result Tables 2 and 3, these algorithms are identified as "Others"

1) Statistical Analysis (p-Test)
Non-parametric tests were performed using the data collected from the experiments. For the p-test calculation, the Mann-Whitney U method was selected, where the statistical significance (non-equivalence) of one algorithm in relation to the other is verified whether p < 0.05, otherwise, both solutions are equivalent with no statistical difference; therefore, the null hypothesis 24 (in bold, see Table 6) must be considered.
The data displayed Tables 3 was obtained from the mean of the best observed fitness of each run, with all algorithms being executed with the same set of initial populations. Based on the results collected, statistical relevance for FC-CPMSO-V1/V2 algorithms can be observed when compared with FC-CPMSO-V1 (single variant particle) in 7 of 10 engineering functions. For the rest, p-values less than 0.05 is detected in all evaluations.
P-values for MSO-ST-V1/V2 algorithm indicates statistical relevance in 8 optimization functions (i.e., HNO1, HNO2, SRD11, WBD1, WBD2, MWTCS, TBT, TCD), and in the constrained functions DPV1 and DPV2, no statistical significance were achieved when comparing MSO-ST-V1/V2 MSO-ST-V1/V2 (two variant particles) and MSO-ST-V1 (one variant particle) 25 . In most cases, based on the tests performed, it is possible to conclude that the use of counterpart particles together with rule-based access procedures (i.e., TMR, PAF) significantly improves the performance, with the use of modified crossover BLX methods providing an additional optimization layer.    2) Graphs (Convergence) The following graph illustrate the behavior of the mean of all runs for assessment of convergence throughout the iterations for each algorithm, exposed in Table 4. The data colected by the experiments were normalized using Equations 57 to 61 in order to adjust results with violations to the plots 26 . Equation 62 is used to generate the convergenece lines displayed in Figure 15.

Mean
[t] = Based on the graphs found in Figure 13, it is possible establish as the following affirmations: 1) The algorithms that used the counterparts's interswarm topology (i.e. FC-CPMSO, FC-CPMSO-V1, FC-CPMSO-V1/V2) were the ones that obtained the best fitness values in all benchmark functions. It was also verified that the use of counterpart particles (P c) and multi-swarm's global best (P s) increased the convergence speed in all constrained optimization. 2) Modest improvements were observed for the algorithms FC-CPMSO-V1 and FC-CPMSO-V1/V2 when comparing them to FC-CPMSO (without variant particles) in six optimization functions (i.e., WBD1, WBD2, SRD11, TCD, HNO1, HNO2), while the difference in the remaining functions has been more significant (i.e., MWTCS, DPV1, DPV2, TBT). Based on the collected data, it is possible to infer that the information exchange among swarms through the TMR and PAF methods affects the performance of the modified BLX crossover (V1, V2), given the fact that both mechanisms tends to increase exploration in space search, which may affect the exploitation aspect of the VPs; 26 The graphs presented in this article were produced using the Matplotlib library [110] 3) A comparative analysis regarding FC-CPMSO-V1 and FC-CPMSO-V1/V2 convergence behaviour shows few differences in performance with one or two variant particles. With exception of DPV1 and TBT, the optimization pattern is almost similar (i.e., SRD11, HNO2) for both algorithms. In other cases (i.e., DPV2), the use of a single variant particle (V1) outperformed the results obtained with two variant particles (V1/V2). 4) Regarding the use of modified crossover BLX for versions of the PSO algorithm that do not use the interswarm topology (i.e. MSO-ST-V1, MSO-ST-V1/V2), significant improvements were detected in relation to classic non-modified multi-swarm PSO (i.e. MSO-ST) for all ten optimization functions.

V. FINAL REMARKS AND CONCLUSION
In this article, a set of new mechanisms were introduced in order to establish a sub-topology for information exchange between swarms (i.e., PAF, TMR), also including a gradientbased crossover for use as local search. In the first case, the objective was the inclusion of components from other swarms during velocity updates, in order to amplify the search capacity by incorporate new references for particle's trajectory, whether from local information (counterpart particles) or global information (best particle among all swarms). For the second case, alternative particles (variants) generated by directed-base modified crossover is restricted only to a local search, in which particles V1 and V2 are generated based on exploration and exploitation schemes. By considering the other implementations of additional components in the velocity update stage [17], [44], [57], [58], the counterpart particles approach is a novel and improved option for data exchange between swarms, where different particle (or combinations) connected by an interswarm sub-topology, is shared among swarms by a set of rules (PAF, TMR), thus enhancing the algorithm's exploration capabilities in the search space. Given the fact that information from other swarms (i.e. counterpart particles, all swarms' global best) were obtained in previous iteration, a copy of those references are distributed for all swarms, which limits the impact on the computational complexity by the proposed mechanisms.
The main reason why the BLX crossover was selected as the foundation for the proposed method (V1/V2) is the fact that BLX is a simple -yet efficient -approach, which presented good results in previous experiments when applied to constrained optimization problems (COPs) [29]. It is important to emphasize that, as in FC-CPMSO, the use of V1/V2 had a low impact on computational complexity. Experiment data suggests viability and significant improvements in the MSO algorithms enhanced with interswarm topology structure based on the TMR and PAF methods, with best fitness values obtained in 8 out of 10 instances of constrained problems. With respect to the directed-based BLX crossover (V1, V2), despite the expressive impact when used with MSOs that don't use counterpart particles and all swarms global best (i.e., MSO-ST-V1, MSO-ST-V1/V2), an unsatisfactory convergence behaviour was notice when comparing FC-CPMSO-V1 and FC-CPMSO-V1/V2. Two possible solutions includes the use of additional trajectory information (e.g. particle's direction tendency in a interval of iterations) and a new combination of boundary parameters (see Equations 50 to 53) in order to establish new search strategies that increases search capacities for both exploration and exploitation.
Even through the aforementioned results, additional mechanisms for improving optimization performance in FC-CPMSO and MSO-ST algorithms are in development. Some of these mechanisms includes a hyperheuristic-based parameter tunning (e.g. inertia weight, acceleration coefficients, boundaries for search subspaces) which uses information related to the swarm's state, such as diversity and convergence counter.
Future works includes the improvement of the mechanisms proposed in this article. We can highlight some upgrades regarding the multi-swarm structure, such as the inclusion of a master-slave hierarchy with modified rules for TMR and PAF. For the variant particles and its directedbased crossover, the addition of a game-based crossover scheme (i.e. GBX2 27 ), previously published by the present authors [29]. The use of Social Interaction has been associated with good results for constrained optimization [29].
Finally, a multi-swarm adaptation of Quantum-Behaved Particle Swarm Optimization (QPSO) enhanced with PAF, TMR and directed-based crossover is in development, and the results will be released in a future publication.