A Two-Phase Learning-Based Swarm Optimizer for Large-Scale Optimization

—In this article, a simple yet effective method, called a two-phase learning-based swarm optimizer (TPLSO), is proposed for large-scale optimization. Inspired by the cooperative learning behavior in human society, mass learning and elite learning are involved in TPLSO. In the mass learning phase, TPLSO randomly selects three particles to form a study group and then adopts a competitive mechanism to update the members of the study group. Then, we sort all of the particles in the swarm and pick out the elite particles that have better ﬁtness values. In the elite learning phase, the elite particles learn from each other to further search for more promising areas. The theoretical analysis of TPLSO exploration and exploitation abilities is performed and compared with several popular particle swarm optimizers. Comparative experiments on two widely used large-scale benchmark datasets demonstrate that the proposed TPLSO achieves better performance on diverse large-scale problems than several state-of-the-art algorithms.


I. INTRODUCTION
T HE EVOLUTIONARY algorithm (EA) is a grouporiented random search technique that simulates the evolution of organisms in nature [1]. Compared with traditional optimization algorithms, such as the calculus-based methods, EA can be free from the natural problems and effectively addresses some complicated practical problems. Therefore, an increasing number of research interests focus on this area [2], [3], [34], [39], [42].
Various EAs have been proposed to solve a wide range of optimization problems so far, such as resource allocation [10], [41]; image processing [11]; network planning [9]; and many others [8], [15], [16]. Specifically, the particle swarm optimization (PSO), first introduced by Kennedy and Eberhart in 1995 [7], has attracted numerous attention for decades. However, researchers found that PSO easily leads to the premature convergence; therefore, it performs poorly when encountering the complicated multimodal problems [25]. Inspired by the nature and human society, many researchers have proposed diverse improvements to PSO, such as PSO with an aging leader and challengers (ALC-PSO) [4] and orthogonal learning PSO (OLPSO) [40], to improve the performance of PSO in dealing with complicated optimization problems. However, these PSO variants are only effective in the low-dimensional space. When dealing with highdimensional optimization problems, the aforementioned algorithms dramatically deteriorate the result. This phenomenon is called "the curse of dimensionality" [23]. As the dimension size increases, the search spaces and the number of local optimal traps increase exponentially [5], [6]. These phenomena are the initiators of premature convergence. Unfortunately, in recent years, most engineering problems indicate that the number of decision variables increases exponentially [24], such as marine underwater signal processing [13], salient-object detection [12], and the training of deep-learning models [17].
Enlightened by human society, some novel learning strategies for PSO have been developed for large-scale optimization. Cheng and Jin proposed a competitive swarm optimizer (CSO) [5] and a social learning particle swarm optimizer (SL-PSO) [6]. These two methods adopt one predominant particle instead of pbest or gbest to update the particles. Similar algorithms are a level-based learning swarm optimizer (LLSO) [35] and segment-based predominant learning swarm optimizer (SPLSO) [36]. These PSO variants largely alleviate the problem of premature convergence because they can provide higher diversity for the swarm than the earlier PSO variants. However, the premature convergence is still the main challenge for large-scale optimization.
To further alleviate the problem of premature convergence, a two-phase learning-based swarm optimizer (TPLSO) is proposed in this article. In TPLSO, inspired by the learning behavior of human society, each evolution is divided into two phases, namely, a mass learning phase and an elite learning phase. In the mass learning phase, particles with different potential in exploration and exploitation are randomly selected to form a study group, and then a competition strategy is adopted among group members to yield the winner and the loser ones. After that, the winner is the leader to guide the This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/ losers. Through this learning strategy, the high diversity can be preserved. During the elite learning phase, some elite particles with good fitness values are picked out by sorting the particles in the population to form a new swarm, and then the elite particles update their positions by learning two better particles in this swarm. It is worth noting that the two better particles mentioned above are randomly selected. In this phase, the learning among elite particles accelerates the convergence, so promising areas can be located as fast as possible. The comparative results with some state-of-the-art algorithms demonstrate the effectiveness of the proposed method.
The remainder of this article is organized as follows. Section II briefly reviews some related works on largescale optimization. The details of TPLSO are presented in Section III. A set of experiments is conducted in Section IV to verify the performance of TPLSO. Finally, Section V concludes this article.

II. RELATED WORKS
The problem studied in this article is to minimize f (X), where X = [x 1 , x 2 , . . . , x D ], D is the dimension of the optimization problems, and x i is the value of the ith dimension. In recent years, researchers have proposed many methods to solve the high-dimensionality problems, and these methods can be roughly divided into two categories, that is, cooperatively coevolutionary algorithms (CCEAs) and novel update strategies for classic PSOs [33], [40].

A. Cooperative Coevolutionary Algorithms
The first CCEA, the cooperative coevolutionary genetic algorithm (CCGA), which decomposes a high-dimensional problem into several lower dimensional subproblems, was proposed by Potter [27]. The performance of CCEA heavily depends on its grouping strategy; thus, researchers focus on designing the grouping methods. Yang et al. [37] proposed a random grouping strategy, which randomly decomposes a D-dimensional problem into m S-dimensional subproblems in each iteration. It is worth noting that S represents the dimension of subproblems and S D. Combining the differential evolution (DE) [30] algorithm with this method, namely, DECC-G [37], performs well on some 1000-D problems. Yang et al. [38] dynamically changed the value of S during each evolutionary cycle and proposed a multilevel CC algorithm, namely, multilevel cooperative coevolution (MLCC). On this basis, the cooperatively coevolving particle swarm optimizer (CCPSO2) [19] was proposed.
A new grouping strategy that detects the interdependent decision variables, called differential grouping (DG), was proposed by Omidvar et al. [23]. DG achieves satisfactory performance when detecting interdependent variables. Based on this, its variants, such as DG2 [24], XDG [31], and GDG [21], have been further proposed to detect more complex structures in optimization problems.

B. Novel Learning Strategies for PSO Variants
In PSO, imitating the foraging behavior of a flock of birds, particles in a swarm are optimized by searching the entire solution space to find a globally best solution. In this algorithm, pbset i and gbest are utilized to guide the learning of all particles, which easily leads to premature convergence. pbest i is the personal best position of the ith particle in current generation and gbest is the best position found in the particle swarm so far. In contrast to the traditional PSOs, inspired by the competition of human society, Cheng and Jin proposed a CSO [5], which adopts one predominant particle instead of pbest i and gbest. In detail, two particles are randomly selected from the swarm for comparison. Then, the loser is updated by learning from the winner, which goes directly to the next generation. In CSO, only the loser is updated in the following way: where X l (t) and V l (t) are the position and velocity of the loser in the t-th generation. r i (i = 1, 2, 3) is the random variable within [0, 1], and ψ is the control parameter ofX(t).
Algorithms similar to CSO are SL-PSO [6] and LLSO [35]. Observing these algorithms, we can find that the update or learning methods of these optimizers greatly increase the diversity of the swarm, thus, premature convergence may be avoided. Despite the large amount of work involved in dealing with large-scale optimization, premature convergence and falling into local optima are still the main challenges in large-scale optimization.

A. Motivation
Studying CSO, LLSO, and SL-PSO, we find that this type of optimizer does not have a good balance between diversity and convergence. CSO emphasizes the diversity and ignores the convergence. In contrast, LLSO and SL-PSO focus on convergence abilities. Therefore, these algorithms can be further improved. To optimize high-dimensional problems more effectively, we seek inspiration from nature and human society. In education, different students usually have different potentials or learning abilities, and their abilities can be quickly improved by learning from each other. In particular, the group cooperative learning model is widely used in educational practice [14], [29]. Similarly, in swarms, different particles have different exploration and exploitation abilities in traversing the objective space. Thus, they should be treated differently. Enlightened by this, a mass learning strategy is proposed in TPLSO, which randomly selects several particles from the swarm to form a group, and then a competition mechanism is adopted to update the members. In real life, the team's strength is often further enhanced by improving the ability of the team's leaders because smart leaders can better guide the team. Thus, an elite learning method is introduced to improve the fitness value of elite particles, which have better fitness values than others in the swarm. In this phase, elite particles are selected, and then the elite particles are updated by learning from any two particles better than themselves. Inspired by the above observation and phenomenon, a reinforced competition-based learning strategy for PSO is proposed, namely, the TPLSO.

B. TPLSO
In TPLSO, the entire evolution process is divided into two parts, namely, mass learning and elite learning.
1) Mass Learning: During this stage, assume the number of particles in the swarm is NP, and the NP particles are further divided into NP/K study groups according to K particles as a group. For each group, different particles have different abilities in exploring and exploiting the search space, and these particles update their positions through the collaboration and competitive strategies. Specifically, the particles compete with each other to determine their ranking in the group; the winner is represented by W, and the inferior and worst particles are represented by L 1 and L 2 , respectively. Then, L 1 updates its position by learning from W, as in CSO. For L 2 , both W and L 1 are used to update its state. The winner passes directly to the elite learning phase, while the losers update their position and velocity using the following strategy: where X W , X L i , V W , and V L i represent the position and velocity of the winner and loser of the competition in each group. It is worth noting that L 1 is the winner compared with L 2 . R 1 , R 2 , and R 3 are three random variables with a range of [0, 1]. ϕ is a parameter within [0, 1] that controls the influence of X L 1 orX.X is the mean position of each group or mean position value of the swarm. ForX, a global version and a local version can be adopted in this article. 1)X g is the mean position of all particles in each evolution.
2)X l is the mean position of each group in each generation.
2) Elite Learning: During the elite learning phase, particles in the swarm are sorted in an ascending order of the fitness value. Afterward, the first N particles of this swarm are selected to form a new swarm, denoted by P h , and the remaining particles are passed directly to the next generation of the swarm. In addition, the swarm size of P h is set as NP/2, in which NP is the original swarm size. Observing the particles in P h , we can see that these N particles are the elite ones, which have better fitness values than the remaining particles in the original swarm. Consequently, these particles usually possess more beneficial information to guide other particle learning and are more likely to be close to the global optimum area. In P h , the particle (j) randomly selects two better particles (r 1 and r 2 ) from the current swarm, and then updates its position and velocity via where j, r 1 , r 2 ∈ [1, N] represent three particle indices, respectively. Specifically, r 1 and r 2 are the indices of two randomly selected exemplars. The other parameters are set as previously.
Note that r 1 < r 2 < j shows that j is lower than r 2 , and they both are lower than r 1 . Thus, X j is worse than X r 2 , and both are worse than X r 1 . Mutual learning between the elite particles further improves their fitness values so that fast convergence can be achieved. The entire evolution process is divided into two phases: mass learning and elite learning. The upper part of this graph is the mass learning process, and the lower part is the elite learning process. In the mass learning phase, three particles are randomly selected from P(t) for competitions. The losers, whose fitness is worse will be updated by learning from the winner, and the winner is directly passed to the elite learning phase. Afterward, particles in P 1 (t) are sorted in ascending order of fitness value. Then, in the elite learning phase, the first N particles are selected to form a new swarm P h . In P h , the particle updates its position by learning from two randomly selected particles with a better fitness value. Calculate the fitness of all particles in P 1 ; 5: Divide all particles into NP/K groups; 6: Identify winner (W) and losers (L 1 , L 2 ) based on competitive mechanisms and f (X W ) < f (X L 1 ) < f (X L 2 ); 7: Update X L 1 according to Eq. (3b) and Eq. (3c); 8: Update X L 2 using Eq. (3a) and Eq. (3c); 9: Add X W , X L 1 and X L 2 into the next phase; 10: //the second phase: Name the swarm at this phase as P 2 11: Sort particles in P 2 in ascending order of fitness value; 12: N optimal particles in P 2 are selected from P 1 to form a new swarm, namely, P h ; 13: for j = 3, . . . , N do 14: Select two particles from [1, j − 1] levels: r 1 , r 2 ; 15: if r 1 > r 2 then 16: Swap(r 1 , r 2 ); 17: end if 18: Update particle X j using Eq. (4) and Eq. (5); 19: end for 20: end while Combining the above two learning phases, the framework of TPLSO is displayed in Fig. 1. The pseudocode of TPLSO is also summarized in Algorithm 1.

C. Theoretical Analysis
Exploration and exploitation play important roles when the particles traverse the search space. A good optimizer should compromise between these two aspects. Here, we investigate the exploration and exploitation capacities of TPLSO by comparing with LLSO [35] and global PSO (GPSO) [7].
1) Exploration Ability: For EA, improving its exploration ability is equivalent to promoting the swarm diversity so that it can avoid falling into the local areas and finding more promising areas. Exploration is particularly important for the Fig. 2. Illustration of the search dynamics of LLSO on a multimodal problem. In this case, point A represents the start position of the particle, and the particles eventually fall into a local optimum.
high-dimensional and multimodal problems with a large number of local traps. The observation of (3a), (3b), and (4) indicates that the learning samples of the particles in (3a) and (4) are two different individuals, and the samples learned from the particles in (3b) are the winner and the mean value of the corresponding groups. According to the theoretical analysis in LLSO [35], regarding (3a) and (4), it is easy to conclude that TPLSO has the same excellent exploration ability as LLSO and has higher diversity than classic PSOs, which utilize pbest i and gbest to guide particle learning such as GPSO. To further investigate the exploration ability of TPLSO,X is the mean value of each group in this section, and (3b) was rewritten as follows: where R 1 , R 2 , R 3 , and ϕ are positive numbers; thus, we have p 1 > 0 and p 2 < 0. Similarly, we can also write the update formula of LLSO as follows: According to the derivation of LLSO [35], the first term on the right-hand side of (7) is a positive number, and the other two terms are negative numbers. Therefore, the values of V i,j and f (X i,j ) continue to decrease so that the particles can constantly search for more promising areas. Unlike (6a), the second term on the right-hand side of (6a) is negative, and the other two are positive, and the value of V L 1 may increase. Thus, particles may search for poor areas. Compared with LLSO, this update strategy may lead to worse performance, but the truth may be the opposite. Let us consider a situation where A is the start position of the particle, as illustrated in Fig. 2. According to the update method of particles in LLSO, as shown in (7), the particles move toward a more promising area in each iteration. In Fig. 2, the particles approach the local optimal area and eventually lead to premature convergence. However, in (6a), the third term on the right is a positive number, and its sum with the second term may also be positive. Then, the values of V L 1 and f (X L 1 ) increase, indicating that the next position of the particle may be worse than the current position, as shown in Fig. 3. In Fig. 3, the particle moves from positions A to B Fig. 3. Illustration of the search dynamics of TPLSO on a multimodal problem. A is the start position of the particle, B is the particle position in the next generation, and global optima can be obtained. and finally finds the global optimal area. Therefore, we can infer that the diversity of TPLSO may be better than LLSO. Thus, TPLSO has a better ability to jump out of the local optimal areas and can find the more promising areas faster. Furthermore, ifX is the mean position of the swarm, the result of exploration ability is the same as before.
2) Exploitation Ability: Exploitation ability and exploration ability are equally important for EAs because the good exploitation ability of an EA can quickly locate better areas so that the optimization time can be reduced, which is important to some problems with limited computational resources and unimodal problems. To analyze the exploitation ability of TPLSO, two particles X W and X L 2 are selected from one group in the mass learning phase. Then, we have Combining pbest and gbest defined in classic PSO, the following relationship can be obtained: Let where X rl 1 ,k 1 is selected from a higher level than X L 2 , and X rl 1 ,k 1 is better than all particles below its level. X W is only better than X L 1 and X L 2 . Thus, we have f ( Similarly, for the canonical PSO such as GPSO, it has According to the definition of X rl 1 ,k 1 , we achieve the following relationship: Combining the above formulas, the following relationship can be derived: The above formula shows that TPLSO, compared with LLSO and GPSO, has a better ability to exploit the small gaps between two positions whose fitness values are very similar.

D. Time Complexity Study
Generally, the time complexity is calculated by analyzing the extra time in each iteration without the fitness evaluations [5]. According to LLSO [35]

IV. EXPERIMENTS
In this section, a suite of experiments is conducted to study the performance of TPLSO from different perspectives. Two benchmark sets, namely, CEC'2010 [32] and CEC'2013 [18], are selected here. The CEC'2010 benchmark set consists of separable functions, partially separable functions, and nonseparable functions. Based on CEC'2010, the CEC'2013 benchmark set introduced some new characteristics, such as the imbalance contribution of subcomponents and overlapping functions. Consequently, the functions in CEC'2013 are much more complicated and harder to optimize than those of the CEC'2010. In this article, unless otherwise specified, all statistical results are over 30 independent runs. The maximum of fitness evaluations for each independent run is set to 3000 × D, where D is the dimension size of the optimization problems [35].

A. Parameter Study
In TPLSO, three additional parameters are introduced, namely, the group size K, the swarm size NP, and the control parameter ϕ. First, the setting of K here has a crucial impact on the algorithm. Thus, to accurately set the study group size (K), several experiments are carried out on the above functions with K = 2, 3, 4, and the statistical results are shown in Table I. In addition, ϕ, NP, and D here are set as 0.15, 600, and 1000, respectively. From Table I, we can find that K = 3 is the most reasonable setting.
Second, the particles generally tend to exhibit premature convergence with a small swarm size (NP). This is because a small swarm size cannot provide high diversity for the swarm. In contrast, if NP is set to a large number, more computing resources are needed during each generation, which is impractical for computationally expensive problems. According to CSO [5], we can find that there exist a correlation between NP and ϕ. Thus, to properly set these two parameters, some experiments are conducted on TPLSO with NP varying from 400 to 800 and ϕ varying from 0.05 to 0.25. Table II shows the statistical results of TPLSO with differential combinations of these two parameters on eight CEC'2010  I  STATISTICAL RESULT (MEAN VALUE IS ON THE FIRST LINE AND STD  VALUE IS ON THE SECOND LINE) OF OPTIMIZATION ERROR OBTAINED  BY TPLSO ON SEVERAL TEST FUNCTIONS OF 1000-D WITH THE  These functions contain almost all types of problems: fully and partially separable, and unimodal and multimodal. Generally, partially separable functions are more difficult to optimize than fully separable functions, and multimodal functions are more difficult to optimize compared with unimodal functions. Multimodal and partially separable functions are closer to real-word optimization problems, so more attention should be focused on these types of functions. From Table II, we can see that the unimodal functions perform better when NP = 400, while the advantage of the multimodal functions concentrates on larger swarm sizes, such as NP = 600. This is because the small swarm size cannot provide a high diversity for the swarm.
In summary, NP = 600, ϕ = 0.15, and K = 3 are used for TPLSO on 1000-D. Specifically, ϕ is set to 0.2 for nonseparable functions (such as f 19 and f 20 in CEC'2010 function sets, and f 13 -f 15 in CEC'2013 function sets) for TPLSO on 1000-D because higher diversity is required when optimizing nonseparable functions.

B. Exploration and Exploitation Influence of EAs
Exploration ability is a crucial factor in addressing the multimodal problems because a good exploration ability can increase the probability of particles to jump out of the local optimum. Exploitation ability can affect the convergence speed of particles; thus, exploitation should be properly biased to seek fast convergence when dealing with unimodal problems [26], [28]. However, exploitation and exploration are often conflicting, so a good EA should balance between these two aspects. To verify that TPLSO is able to compromise these two abilities properly, a set of experiments is conducted on four CEC'2010 test functions and compared with CSO and GPSO. The used functions are f 3 (fully separable and  unimodal), f 7 (partially separable and unimodal), f 8 (partially separable and multimodal), and f 18 (partially separable and multimodal), respectively. The diversity is used to measure particle exploration abilities, and it is defined as [5], [22] where D P is the diversity of swarm P, is the mean value of the dth dimension over all particles in the swarm. x d j is the value of the dth dimension of particle j, and NP is the swarm size.
The comparative results of the three algorithms are shown in Fig. 4. The maximum number of FEs is set at 3000 × D, where D is the dimension size. In addition, the swarm size NP is set to 400 for all algorithms here. We obtain the following findings from this figure.
From Fig. 4(a) and (b), for the unimodal functions, we find that the exploitation ability of TPLSO is properly biased on f 3 and f 7 ; thus, TPLSO has a better solution and converges faster than CSO and GPSO. Specifically, considering f 7 , it can be seen that after FEs > 2.5E + 06, the fitness value and diversity of TPLSO have changed substantially because the high diversity is maintained during the mass learning phase, which allows the particles to jump out of the local trap and then quickly converge in the elite learning phase. The same results also can be observed from f 8 . For the multimodal functions, the ability to explore TPLSO is properly emphasized so that stagnation and premature convergence can be avoided. Therefore, TPLSO obtains more satisfactory performance compared with CSO and GPSO on multimodal functions f 8 and f 18 . In summary, we find that TPLSO not only maintains the good exploitation and exploration abilities but also compromises these two aspects well during each evolution.
There are two ways to calculate the mean positionX in TPLSO, namely, one global versionX g and one local ver-sionX l .X g can preserve a higher diversity compared toX l . Therefore, to further verify the diversity influence of EAs, several experiments are conducted on the CEC'2010 benchmark function of TPLSO with different mean position. This algorithm proposed in this article withX l was denoted by TPLSO-L. The corresponding results are shown in Table III, and the highlighted value indicates that TPLSO-L obtains better performance. From this comparison, we can see that TPLSO-L shows better performance on the unimodal functions compared with TPLSO.

C. Comparisons With State-of-the-Art Algorithms
To verify the feasibility of TPLSO, we compare it with a series of state-of-the-art algorithms dealing with large-scale optimization on the CEC'2010 and CEC'2013 function sets with the dimension of 1000. In particular, four popular PSO variant algorithms, including LLSO [35], CSO [5], SL-PSO [6], and dynamic multiswarm particle swarm optimizer (DMS-L-PSO) [20], and three CCEAs, namely, MLCC [38], cooperative coevolution with DG (DECC-DG) [23], and CCPSO2 [19] are selected for comparison. To provide a fair comparison, the key parameters used in each algorithm are set as the recommendations in the corresponding paper.
The statistical results on the CEC'2010 and CEC'2013 test functions are shown in Tables III and IV, respectively. The T values are listed along with the mean and standard deviation for measuring the statistical results. The highlighted T value indicates that TPLSO is significantly better than the corresponding algorithm. Furthermore, a two-tailed T-test was used to compare two different statistical results for the significance level of α = 0.05. In addition, w/t/l in the last row indicates that TPLSO wins on w functions, ties on the t functions, and loses on l functions.  Table III shows that TPLSO achieves better performance than the comparative algorithms on most of the 20 functions. Compared with the four popular PSO variants, TPLSO indicates its considerable advantages on 12, 17, 19, and 14 functions, respectively, while TPLSO fails only on 3, 2, 1, and 4 functions, respectively. In particular, in contrast with DLLSO, which gains the best performance in the current PSO variants, TPLSO fails only on two separable and unimodal functions (f 1 , f 3 ) and one partially separable and multimodal function (f 15 ). It is worth noting that TPLSO performs identically on five functions (f 4 , f 5 , f 13 , f 18 , and f 19 ) as DLLSO, and on these five functions, TPLSO obtains better performance. Further observation shows that TPLSO has better performance in most partially separable and nonseparable functions, which indicates that TPLSO preserves the higher diversity than DLLSO. In comparison with MLCC, CCPSO2, and DECC-DG, TPLSO wins on 16, 19, and 16 functions, and it only loses 4, 1, and 1 functions, respectively.
2) Results on CEC'2013: The statistical results of Table IV demonstrate that TPLSO consistently outperforms the comparative algorithms on the CEC'2013 set, where the functions are more difficult to optimize than those of the CEC'2010 set. Compared with the four PSO variants, TPLSO defeats them on 9, 10, 10, and 10 functions. Compared with the three CCEAs, TPLSO shows a significant effectiveness on at least 10 functions.
Observing the results in Tables III and IV, we find that TPLSO achieves better performance in the solution quality. The superiority of TPLSO can be attributed to the mass and elite learning strategies and the exemplar method. In the mass learning phase, particle grouping in the swarm and the exemplar selection in the study group enhance the diversity of the swarm. Particles have a greater chance to jump out of the local optima and to look for more promising areas. In the elite learning phase, mutual learning between particles further enhances their fitness values, which leads to promising areas as soon as possible. In summary, these two types of compromises in exploration and exploitation make TPLSO achieve good performance.

V. CONCLUSION
In this article, we proposed a new swarm algorithm called TPLSO that is based on mass learning and elite learning strategies. The theoretical analysis shows that these two strategies are able to preserve the exploration and exploitation abilities of TPLSO well. Despite the simplicity of the proposed TPLSO algorithm, the comparative studies on 1000-D CEC'2010 and CEC'2013 test functions showed that it outperforms several state-of-the-art approaches for large-scale optimization.