Adaptive Granularity Learning Distributed Particle Swarm Optimization for Large-Scale Optimization

Large-scale optimization has become a significant and challenging research topic in the evolutionary computation (EC) community. Although many improved EC algorithms have been proposed for large-scale optimization, the slow convergence in the huge search space and the trap into local optima among massive suboptima are still the challenges. Targeted to these two issues, this article proposes an adaptive granularity learning distributed particle swarm optimization (AGLDPSO) with the help of machine-learning techniques, including clustering analysis based on locality-sensitive hashing (LSH) and adaptive granularity control based on logistic regression (LR). In AGLDPSO, a master–slave multisubpopulation distributed model is adopted, where the entire population is divided into multiple subpopulations, and these subpopulations are co-evolved. Compared with other large-scale optimization algorithms with single population evolution or centralized mechanism, the multisubpopulation distributed co-evolution mechanism will fully exchange the evolutionary information among different subpopulations to further enhance the population diversity. Furthermore, we propose an adaptive granularity learning strategy (AGLS) based on LSH and LR. The AGLS is helpful to determine an appropriate subpopulation size to control the learning granularity of the distributed subpopulations in different evolutionary states to balance the exploration ability for escaping from massive suboptima and the exploitation ability for converging in the huge search space. The experimental results show that AGLDPSO performs better than or at least comparable with some other state-of-the-art large-scale optimization algorithms, even the winner of the competition on large-scale optimization, on all the 35 benchmark functions from both IEEE Congress on Evolutionary Computation (IEEE CEC2010) and IEEE CEC2013 large-scale optimization test suites.

Adaptive Granularity Learning Distributed Particle Swarm Optimization for Large-Scale Optimization Zi-Jia Wang , Student Member, IEEE, Zhi-Hui Zhan , Senior Member, IEEE, Sam Kwong , Fellow, IEEE, Hu Jin , Senior Member, IEEE, and Jun Zhang , Fellow, IEEE Abstract-Large-scale optimization has become a significant and challenging research topic in the evolutionary computation (EC) community.Although many improved EC algorithms have been proposed for large-scale optimization, the slow convergence in the huge search space and the trap into local optima among massive suboptima are still the challenges.Targeted to these two issues, this article proposes an adaptive granularity learning distributed particle swarm optimization (AGLDPSO) with the help of machine-learning techniques, including clustering analysis based on locality-sensitive hashing (LSH) and adaptive granularity control based on logistic regression (LR).In AGLDPSO, a master-slave multisubpopulation distributed model is adopted, where the entire population is divided into multiple subpopulations, and these subpopulations are co-evolved.Compared with other large-scale optimization algorithms with single population evolution or centralized mechanism, the multisubpopulation distributed co-evolution mechanism will fully exchange the evolutionary information among different subpopulations to further enhance the population diversity.Furthermore, we propose an adaptive granularity learning strategy (AGLS) based on LSH and LR.The AGLS is helpful to determine an appropriate subpopulation size to control the learning granularity of the distributed subpopulations in different evolutionary states to balance the exploration ability for escaping from massive suboptima and the exploitation ability for converging in the huge search space.The experimental results show that AGLDPSO performs better than or at least comparable with some other state-of-the-art large-scale optimization algorithms, even the winner of the competition on large-scale optimization, on all the 35 benchmark functions from both IEEE Congress on Evolutionary Computation (IEEE CEC2010) and IEEE CEC2013 large-scale optimization test suites.

I. INTRODUCTION
E VOLUTIONARY computation (EC) algorithms, includ- ing evolutionary algorithms (EAs) and swarm intelligence algorithms (SIs) [1]- [9], such as genetic algorithm (GA) [10], [11]; differential evolution (DE) [12]- [15]; particle swam optimization (PSO) [16]- [18]; and ant colony optimization (ACO) [19]- [21], have been widely studied and applied in many real-world optimization problems.However, with the increasing scale of problems, the traditional EC algorithms lose their effectiveness and advantages rapidly when the dimension of problem increases, which is so-called "curse of dimensionality" [22]- [26].The main reason for this phenomenon can be due to two points.On the one hand, the search space drastically and exponentially increases with the growing dimensionality, traditional EC algorithms have to find the optimal solution in the huge search space, causing the slow convergence.On the other hand, there are massive local optimal regions in the search space, traditional EC algorithms have to further improve the population diversity so as to escape from the local optima in such a complex environment.
To tackle with the large-scale optimization problems (often more than 500 dimensions), a common method is utilizing a cooperative coevolution (CC) framework in EC algorithms to decompose the high-dimensional problem into several mid/lower dimensional subproblems and to solve each subproblem separately.The CC framework was first proposed by Potter and Jong in 1994 [27].Due to its effectiveness, researchers have developed many variants by introducing the CC framework into different EC algorithms (CCECs), including CCGA [27], [28]; CCPSO [29], [30]; and DECC [31]- [34].Even though the CCECs are intensively studied and have achieved a considerable success, but undeniably, they also have some limitations.First, the decomposition strategy is the most crucial component This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/for CCECs.Therefore, the performance of CCECs is highly sensitive to the decomposition strategies.Second, a considerable number of fitness evaluations (FEs) are required since we have to solve each subproblem separately, especially when the dimension of the problem is large.Also, in order to improve the accuracy of decomposition, additional FEs are often needed to detect the dependency among variables [33]- [35].Third, CCECs are only effective and helpful for separable problems.When dealing with the partially separable problems or nonseparable problems, the performance of CCECs deteriorates greatly.
Taking the above limitations of CCECs into account, researchers proposed some novel search strategies for EC algorithms to tackle with all dimensions as a whole [36], [37].Moreover, studies have shown that coevolutionary [38], [39] and distributed [40], [41] mechanisms with multiple populations can fully improve optimization efficiency.
However, when using the multipopulation coevolutionary mechanism, the size of the population (i.e., the granularity) is also necessary and needed to be set appropriately.Large population (coarse granularity) and small population (finegranularity) may be suitable for different evolutionary states in different problems.Many current multipopulation coevolutionary algorithms only use the simple dynamic-changed mechanism.For example, the dynamic multiswarm PSO (DMS-L-PSO) is proposed in [42], where the swarm size is randomly changed in every certain generation.However, random change is difficult for providing the appropriate population size and lacks the adaptive granularity control.If we can further introduce the adaptive granularity control and find the appropriate population size to meet the search requirement of the different evolutionary states in different problems, the search process will be more effective.
Machine learning (ML) is one of the most promising research areas in artificial intelligence, which has become a powerful tool in a wide range of applications [43].Since EC algorithms have stored ample data about the search space, problem features, and population information during the iterative search process, the ML technique is helpful in analyzing these data to further enhance the search performance.In this way, useful information can be extracted to analyze the evolutionary state and to achieve the adaptive granularity control.Therefore, to further improve the searching ability and achieve an adaptive algorithm, this article develops a novel adaptive granularity learning distributed PSO (AGLDPSO) with the help of ML techniques, including clustering analysis based on locality-sensitive hashing (LSH) and adaptive granularity control based on logistic regression (LR).More specifically, the novelties and advantages of AGLDPSO contain the following two aspects.
1) The master-slave multisubpopulation distributed model is adopted in AGLDPSO, where the entire population is randomly divided into multiple subpopulations and these subpopulations are co-evolved.Compared with other large-scale optimization algorithms with single population evolution or centralized mechanism, the multisubpopulation distributed co-evolution mechanism will fully exchange the evolutionary information among different subpopulations to further enhance the population diversity.2) An adaptive granularity learning strategy (AGLS) is further proposed based on LSH and LR.The AGLS is helpful to determine an appropriate subpopulation size to control the learning granularity of the distributed subpopulations in different evolutionary states to balance the exploration ability for escaping from massive suboptima and the exploitation ability for converging in the huge search space.Experiments are conducted on all the 35 benchmark functions from both IEEE Congress on Evolutionary Computation (IEEE CEC2010) and IEEE CEC2013 large-scale optimization test suites.The results show that AGLDPSO is better than, or at least comparable to some other state-of-the-art large-scale optimization algorithms, even the winner of the competition on large-scale optimization, showing the effectiveness and superiority of our AGLDPSO algorithm.
The remainder of this article is organized as follows.Section II reviews the traditional PSO algorithm and some PSO variants for large-scale optimization.Section III presents the AGLDPSO algorithm in detail.The experimental results from both IEEE CEC2010 and IEEE CEC2013 large-scale optimization test suites between AGLDPSO and some other state-of-the-art large-scale optimization algorithms are shown in Section IV.Finally, the conclusion will be drawn in Section V.

A. PSO
In PSO [44], the member of the population is called particle.Each particle P i has two vectors.The vector ] means the velocity of P i , while the vector .In every generation, each particle P i updates its velocity and position based on its own pbest i and the gbest of the entire population.The velocity V i and position X i of each particle are updated according to the following formulas: where ω is the inertia weight to balance the global and local search abilities.c 1 and c 2 are the acceleration coefficients, where c 1 pulls the particle to its own pbest, ensuring the diversity of the population; while c 2 pushes the particle to the current gbest, ensuring the speed of convergence.

B. PSO Variants for Large-Scale Optimization
The traditional PSO algorithm or its variants may be promising in solving small-scale or low-dimensional problems [45]- [47].However, when the scale or dimension of the problem increases, PSO and its variants may lose their effectiveness and feasibility.The huge search space and the exponentially increasing local optima are two of the most important challenges.
When using PSO to tackle with the large-scale optimization problems, a common method is combining the CC framework with PSO, called CCPSO [29], [30].For instance, in CCPSO-S K [29], it randomly divides the entire problem into K subcomponents, while each subcomponent contains D/K dimensions.After that, PSO is applied to optimize each subcomponent.Meanwhile, its variant CCPSO-H K is also proposed [29], which combines PSO and CCPSO-S K , where PSO and CCPSO-S K are evolved alternately.In CCPSO2 [30], Li and Yao developed a new scheme to dynamically determine the coevolving subcomponent size by randomly choosing from a predefined size pool.However, different subcomponent sizes are suitable for different problems.Using fixed subcomponent size or using dynamic subcomponent size may be difficult to provide the suitable subcomponent size for different problems.
Although CCPSOs have achieved a considerable success, they also have some limitations.First, the performance of CCPSOs is highly sensitive to the decomposition strategies.Second, since we have to solve each subproblem separately, a considerable number of FEs are required, especially when the dimension of the problem is large.Third, the current decomposition strategies [33]- [35] also need extra FEs to detect the dependency among variables.In addition, CCPSOs are only effective and helpful for separable problems.When dealing with the partially separable problems or nonseparable problems, their performance deteriorates greatly.
Taking the above limitations of CCPSOs into consideration, some researchers design the novel search strategies for PSO to effectively explore in the huge search space and avoid local optima.In SL-PSO, proposed by Cheng and Jin [36], each particle learns from other better particles in the entire population after the fitness sorting.Besides, they also propose CSO [37], where two particles are randomly selected from the population for competing, then the loser learns from the winner.Different from the traditional PSO, both SL-PSO and CSO use some other better particles and the mean position of the entire population to guide the evolution rather than using only the personal best pbest and gbest.Dynamic segmentbased predominant learning swarm optimizer (DSPLSO) [48], proposed by Yang et al., follows the framework of CSO, where the worse particles will learn from different better particles through the segment-based predominant learning strategy.Moreover, they also develop dynamic level-based learning swarm optimizer (DLLSO) [49], which first separates particles into a number of levels, then each particle guides by two particles from different higher levels.These two methods use a different dynamic mechanism based on the softmax function and probabilistic scheme rather than the simple random mechanism.In DMS-L-PSO [42], it generates several subswarms randomly in every certain generation for exchanging information among different subswarms to improve the population diversity.Meanwhile, it utilizes a quasi-Newton-based local search strategy to further refine the solutions.
Although many improved PSOs have been proposed to deal with the large-scale optimization problems, the slow convergence in the huge search space and the trap into local optima among massive suboptima are still the main challenges in large-scale optimization.Therefore, AGLDPSO is proposed to relieve these issues.

A. Master-Slave Multisubpopulation Distributed Framework
The master-slave multisubpopulation distributed framework is illustrated in Fig. 1, where the master node dominates multiple slave nodes in the parallel hardware.
During the evolution process, the master randomly divides the entire population into N/M subpopulations of the same size and sends each subpopulation to its corresponding slave, where N is the population size and M is the size of the subpopulation.Note that if N%M = 0, the last subpopulation will have M + N%M particles (where % stands for the modulo operation).Then, different subpopulations are co-evolved concurrently on their slave nodes.After the evolution, each slave sends the updated subpopulation back to the master.Different from the traditional master-slave distributed framework, the number of slaves is adaptively changed since the size (learning granularity) of subpopulation is adaptively controlled according to the LSH and LR (shown in Section III-C).

B. Velocity and Position Update
Before we introduce AGLS, we first describe the velocity and position update operators in AGLDPSO.The formulas of updating the velocity and position are shown as ( 3) and ( 4), which are similar to (1) and ( 2), but with the following two differences: 1) Strategy Difference: After the population partition, only the worst particle P w in each subpopulation is updated by learning from the best particle in the current subpopulation (called subpopulation best sbest) and the best particle from the entire population (called global best gbest), while other particles in the subpopulation will enter to the next generation directly.This velocity update strategy only updates the worst particle in each subpopulation, being helpful to save more FEs to prolong the evolutionary process to further increase the solution accuracy.2) Parameter Difference: The inertia weight ω in PSO is often set as linearly decreasing from 0.9 to 0.4, while herein it is replaced by the random number in AGLDPSO.The random mechanism is beneficial to maintain the learning diversity and the population diversity.Meanwhile, acceleration coefficients c 1 and c 2 that control the convergence speed to the sbest and gbest are often set as 2.0 in the traditional PSO, while in AGLDPSO, both c 1 and c 2 are set smaller than that in traditional PSO to avoid premature convergence, which is replaced by 1.0 and 0.1, respectively.The new setting parameters ω, c 1 , and c 2 are beneficial for AGLDPSO to maintain the population diversity, which can effectively avoid being trapped in the local optima.We have also compared the experimental results between AGLDPSO and the traditional PSO on the IEEE CEC2010 large-scale optimization test suite, the results listed in Table S.I in the supplementary file fully show the effectiveness and superiority of our velocity updating strategy and parameter setting in AGLDPSO.

C. AGLS
The subpopulation size M is important for AGLDPSO because it affects the learning granularity for the particle in each subpopulation and further affects the diversity for exploration and the convergence for exploitation, which is needed to be set appropriately.For a given population size, if the subpopulation size M is large, it means the coarse granularity because the number of particles in each subpopulation is large.Therefore, the sbest is the best particle from a large subpopulation, so that P w is learning from a large neighborhood and therefore, is helpful for accelerating the convergence speed.In contrast, if M is small, it means the fine granularity to learn from small subpopulation.Therefore, the number of subpopulations is large, so that many sbest can be learned to increase the diversity.Although the subpopulation size M has a significant influence on the algorithm performance, how to precisely determine a suitable M in different evolutionary states for a given problem is still hard to solve.
In the literature, researchers have found that the exploration and exploitation abilities of EC algorithms can be adaptively adjusted according to the evolutionary state [50]- [52].In this sense, clustering analysis has been widely used to estimate the evolutionary state and the parameters of EC algorithms can be adjusted adaptively.However, when the dimension of the problem increases, the traditional clustering analysis approach will be very time consuming and ineffective.In order to design a more lightweight and efficient evolutionary state estimation approach, the LSH approximation method is adopted to ensure the estimation quality and to overcome the time bottleneck [53].After the clustering based on LSH, the learning granularity M is adaptively controlled based on the LR.Therefore, we first introduce a novel clustering analysis method based on LSH and then introduce the adaptive granularity control based on LR.
1) Clustering Analysis Based on LSH: LSH has been successfully applied in EC algorithms for solving multimodal optimization problems [54].Here, we modified and applied this method into AGLDPSO for tackling with the large-scale optimization problems.
The basic idea of LSH is that two neighboring individuals in the original space have a large probability to be adjacent in the new space by the same mapping.Thus, the distance of individuals in the high-dimensional space can be calculated approximately and quickly by mapping these individuals into the low-dimensional space.A simple example of LSH in a 2-D Euclidean space is shown in Fig. 2. Each hash function is defined as a projected line in the space, and there are infinite hash functions.Then, each projected line is divided into several equal segments with size r.Each segment represents a "bucket" for storing the individuals.We randomly generate a hash function (projected line) and project all the individuals onto the line.Then, two close individuals will be hashed into the same bucket with a large probability.This property can be used to simplify the distance computing in clustering.Suppose that there are five individuals (A, B, C, D, and E) in the space.The similar individuals B and C can be very likely hashed into the same bucket using the hash functions h 2 and h 3 .Now, we extend LSH into any dimensional problems.In a D-dimensional problem, each individual x i can be expressed as a vector (x i,1 , x i,2 , . . ., x i,D ).In every generation, we first randomly generate a D-dimensional vector O within the search space and calculate the dot product (x i O T ).The dot product projects this individual (vector) onto a line.Therefore, if the individuals x 1 and x 2 are close ( x 1 −x 2 is small), the distance between their projections (x 1 −x 2 ) O T is very likely small and they are in the same bucket with a large probability.In contrast, if they are far from each other ( x 1 −x 2 is large), the distance between their projections (x 1 − x 2 ) O T is very likely large and they are in the same bucket with a small probability.
When all these individuals have been projected, we record the maximal and minimal coordinates (h max and h min ) of all the projected points.Then, the bucket size r is calculated as where nb is the number of buckets.In our method, nb is set as 0.1 × N. Next, we divide the hash line into nb equal-width segments with size r, the hash values of these individual are the segments they are projected to.The hash function is defined as where b is a real number called shifted projection, which is randomly generated within the interval [0, r].Assume that the distance between the projections of two individuals is d, if d is larger than r, then the two individuals will be obviously in two different buckets.If d is smaller than r, then the probability of the two individuals being in the same bucket will be 1 − d/r.When all the individuals have been projected, the individuals in the same bucket are assigned to the same cluster.
2) Adaptive Granularity Control Based on LR: After clustering based on LSH, the subpopulation size (granularity) M will be adaptively controlled based on the clustering analysis results.Consider two situations.When there are much more particles clustering nearby the globally worst particle gworst than the globally best particle gbest, as shown in Fig. 3(a), it represents the exploration state.At this time, the gbest is far away from the current population, and the current population may obtain trapped in the local area.As a result, a smaller M or decreasing M is more likely to result in a larger number of subpopulations to maintain and improve the population diversity for the exploration state.Conversely, when there are much more particles clustering nearby the gbest than the gworst, the algorithm is in the exploitation state like Fig. 3(b).At this time, a larger M or increasing M is more suitable to increase the learning neighborhood to accelerate the convergence speed for the exploitation state.
Since we have two evolutionary states (exploration and exploitation) and we wish to decrease or increase the subpopulation size M, respectively, a binary classifier is preferred here.In ML, LR is a famous binary classifier that can determine one or more independent variables to a dichotomous outcome (0 or 1).The core of LR is the sigmoid function, which maps the variables into a probability in (0, 1), shown as where z is the linear combination of the variables.Therefore, we think that the LR can help us to determine/classify whether the algorithm is in exploration or exploitation evolutionary state.Herein, the variables for the input of LR are the number of particles in the same cluster with gworst (called N gworst ) and the number of particles in the same cluster with gbest (called N gbest ).Moreover, in our method, we used a variant of the LR function to estimate the evolutionary state and adaptively changed the M according to the evolutionary state.Since we wish to adaptively increase or decrease the subpopulation size M, herein, we use a variant of the sigmoid function, called the Tanh-sigmoid function, to map the variables into the range in (−1, 1), shown as where the input z is set as the difference between N gworst and N gbest , as illustrated in Fig. 4. Therefore, the subpopulation size M can be adaptively changed at the beginning of every generation as Using the above function, the following adaption rules are implemented.If N gworst is much larger than N gbest , it means the current subpopulation may obtain trapped in the local optima.In this case, the LR will return 1 as shown in Fig. 4, where z is very large, and therefore, M will decrease by 1 and smaller M will improve the population diversity.If N gbest is much larger than N gworst , it indicates the global optima region may be found.Therefore, M will increase by 1 (as shown in Fig. 4, where z is very small and the LR will return −1) to obtain larger M to speed up the convergence.However, when N gworst is closed to N gbest , it means the current population may have a good balance between diversity and convergence.In this case, z in Fig. 4 is near 0 and the LR will return 0, so that M will remain unchanged.
Therefore, the adaptive granularity control based on LSH and LR can relieve the sensitivity of parameters and find an appropriate subpopulation size for AGLDPSO, which can further find a potential balance between exploration and exploitation.

D. Complete AGLDPSO Algorithm
Combining all the components mentioned above, the pseudocode of the complete AGLDPSO algorithm is outlined in Algorithm 1.
In each generation, we first adaptively set the size of subpopulation (learning granularity) M according to the LSH and LR using (9) in master.Then, the entire population will be randomly divided into N/M subpopulations of the same size.Next, the master will send each subpopulation to its corresponding slave, and each subpopulation is updated on its corresponding slave.After that, each updated subpopulation will be sent from its corresponding slave to master, and the master will collect all the updated subpopulations to form a new population.So far, we have finished a loop sequent.The procedures are repeated until the maximum number of FEs (MaxFEs) is met.

A. Experimental Setup
To test the performance of AGLDPSO, two widely used large-scale optimization benchmark functions test suites are used.The first one is the IEEE CEC2010 test suite [55], which contains 20 large-scale optimization benchmark functions.While the other one is the IEEE CEC2013 test suite [56], which contains 15 large-scale optimization benchmark functions.
We compare the results obtained by AGLDPSO with six PSO-based large-scale optimization algorithms, including CCPSO2 [30], SL-PSO [36], CSO [37], DSPLSO [48], DLLSO [49], and DMS-L-PSO [42].Moreover, we also compare AGLDPSO with other three well-known large-scale optimization algorithms, including DECC with differential grouping (DECC-DG) [33], DECC with random grouping (DECC-G) [34], and multilevel CC (MLCC) [35].The master-slave model of AGLDPSO is built in a multiprocessor distributed environment that consists of several distributed computing servers.The CPU of each server has eight processors configured with Intel Core i5-7400, 3.00 GHz.Therefore, we obtain the multiprocessor distributed environment and we can assign each subpopulation to one processor through MPI.
The MaxFEs is set as 3 000 000 for all competitors.The population size N is set as 500 in AGLDPSO and the interval for the M is [10, √ N].All the algorithms run 30 times independently for statistics and the mean results are reported.The parameters used in the compared algorithms are set the same in their original papers for a fair comparison.In addition, Wilcoxon's rank-sum test at α = 0.05 is performed between AGLDPSO and other state-of-the-art large-scale optimization algorithms to evaluate the statistical significance of their performance [57].The symbols "+," "≈," and "−" indicate AGLDPSO performs significantly better than (+), similar to (≈), or significantly worse than (−) the corresponding algorithm.

B. Comparisons With State-of-the-Art Large-Scale Optimization Algorithms on the IEEE CEC2010 Test Suite
The functions in this test suite are with 1000 dimensions and can be classified into three groups.The first group consists of three separable functions f 1 -f 3 .The second group includes the following 15 functions f 4 -f 18 , which are partially separable functions.The last group consists of the last two functions f 19 and f 20 that are nonseparable functions.All these functions are shifted and rotated, which are more difficult to solve and make our test more comprehensive and convincing.
The detailed comparison results of AGLDPSO and other state-of-the-art large-scale optimization algorithms on the IEEE CEC2010 test suite are listed in Table I.For clarity, the best results are highlighted in boldface.From Table I, we can see the following.
For the first three separable functions f 1 -f 3 , AGLDPSO performs significantly better than most of other algorithms, especially on f 3 .Although it performs slightly worse than DLLSO and MLCC on these three functions, they both lose their feasibilities when dealing with the partially separable or nonseparable functions, which will be discussed as follows.
For the last two nonseparable functions f 19 and f 20 , the performance of AGLDPSO is still better than or at least comparable to other algorithms, only worse than DECC-G on f 19 and CSO on f 20 .
Overall, AGLDPSO performs better than CCPSO2, SL-PSO, CSO, DMS-L-PSO, DSPLSO, DLLSO, DECC-DG, DECC-G, and MLCC on 19, 17, 14, 20, 14, 11, 15, To further study the evolutionary behavior of different algorithms on the IEEE CEC2010 test functions, we draw their convergence curves to observe their evolutionary processes.Besides, in order to make our comparison more convincing, we choose several benchmark functions from all the three groups.Here, we select separable function f 3 , partially separable functions f 4 , f 8 , f 9 , and f 13 , and nonseparable function f 20 as the representative instances.The convergence curves of AGLDPSO, CCPSO2, SL-PSO, CSO, DMS-L-PSO, DSPLSO, DLLSO, DECC-DG, DECC-G, and MLCC on these six selected benchmark functions are plotted in Fig. 5.
From Fig. 5(a), we can see that only AGLDPSO, DSPLSO, and DLLSO can converge to better solutions quickly while other algorithms evolve slower on separable function f 3 .Moreover, AGLDPSO achieves faster convergence speed than DSPLSO and is a little slower than DLLSO on this function.While on partially separable functions f 4 and f 9 in Fig. 5(b) and (d), AGLDPSO and SL-PSO can achieve better and faster convergence than other algorithms, but AGLDPSO still has a faster convergence speed compared with SL-PSO.On partially separable function f 8 shown in Fig. 5(c), most algorithms stagnate in the early stage, and only AGLDPSO can obtain better and more accurate solutions.While for the partially separable function f 13 and nonseparable function f 20 in Fig. 5(e) and (f), most algorithms can achieve the similar performance.However, AGLDPSO still obtains more accurate results than other algorithms, only performs a little worse than CSO on f 20 .
Overall, AGLDPSO generally outperforms other large-scale optimization algorithms on these benchmark functions from IEEE CEC2010 test suite.

C. Comparisons With State-of-the-Art Large-Scale Optimization Algorithms on the IEEE CEC2013 Test Suite
The functions in this test suite are with 1000 dimensions and can be classified into four groups.The first group includes three separable functions f 1 -f 3 .The second group consists of the following eight functions f 4 -f 11 , which are partially separable functions.The third group includes three overlapping functions f 12 −f 14 .(Note that f 13 and f 14 are with 905 dimensions.)The last group consists of the last function f 15 which is a nonseparable function.
The detailed comparison results of AGLDPSO and other state-of-the-art large-scale optimization algorithms on the IEEE CEC2013 test suite are listed in Table II.For clarity, the best results are highlighted in boldface.From Table II, we can see the following.
For the first three separable functions f 1 -f 3 , AGLDPSO dominates CSO, DMS-L-PSO, and DECC-DG on at least two functions.Moreover, AGLDPSO can achieve at least comparable performance among other algorithms, except DLLSO and MLCC.Although AGLDPSO performs a little worse than DLLSO and MLCC, they both lose their feasibilities when dealing with other functions, which will be discussed as follows.
For the next eight partially separable functions f 4 -f 11 , AGLDPSO performs the best on f 4 and f 8 .It dominates all the other algorithms on at least four functions, while all the competitors cannot outperform AGLDPSO on more than three functions.
For the next three overlapping functions f 12 -f 14 , AGLDPSO performs significantly better than most of other algorithms, especially on f 14 .It dominates other algorithms on at least two functions, except CSO and DSPLSO.Even if CSO and DSPLSO perform better than AGLDPSO on f 12 and f 13 , they both lose their feasibilities when dealing with the nonseparable function, shown as follows.
For the last nonseparable function f 15 , AGLDPSO achieves the best performance significantly and dominates all the competitors.
To further study the evolutionary behavior of different algorithms on the IEEE CEC2013 test functions, we draw their convergence curves to observe their evolutionary processes.Besides, in order to make our comparison more convincing, we choose several benchmark functions from all the four groups.Here, we select separable function f 1 , partially separable functions f 4 , f 5 , and f 8 , overlapping function f 14 , and nonseparable function f 15 as the representative instances.The convergence curves of AGLDPSO, CCPSO2, SL-PSO, CSO, DMS-L-PSO, DSPLSO, DLLSO, DECC-DG, DECC-G, and MLCC on these six selected benchmark functions are plotted in Fig. 6.
From Fig. 6(a), we can see that only AGLDPSO and MLCC can converge to better solutions quickly while other algorithms evolve slower on separable function f 1 .However, AGLDPSO converges a little slower than MLCC.While on partially separable function f 4 in Fig. 6(b), only AGLDPSO and SL-PSO can obtain more accurate results.Similar phenomenon can be observed on partially separable function f 8 and overlapping function f 14 in Fig. 6(d) and (e).On partially separable function f 5 in Fig. 6(c), AGLDPSO performs much better and obtains more promising results than most algorithms, only worse than CSO, DSPLSO, and DLLSO.While on the nonseparable function f 15 in Fig. 6(e), most algorithms can achieve the similar performance.However, AGLDPSO can obtain much better and more accurate results than other competitors.
Overall, AGLDPSO generally outperforms other large-scale optimization algorithms on these benchmark functions from IEEE CEC2013 test suite.

D. Comparison With Winner of IEEE CEC2010 Competition
To further demonstrate the superiority of AGLDPSO, in this section, we compare AGLDPSO with the winner of the IEEE CEC2010 competition on large-scale optimization, the memetic algorithm based on local search chains (MA-SW-Chains) [58].For a fair comparison, we directly cite the mean results of MA-SW-Chains from the original paper [58].
The detailed comparison results of AGLDPSO and MA-SW-Chains are listed in Table III.The best results are highlighted in boldface.Due to the lack of the detailed results of MA-SW-Chains in each run, whether AGLDPSO is better than (+), worse than (−), or similar to (≈) MA-SW-Chains is just measured by the mean results.
From Table III, we find that AGLDPSO still keeps its promising performance when compared with MA-SW-Chains.

E. Scalability of AGLDPSO on 2000-D Problems
In order to investigate the scalability of AGLDPSO, we further compare the performance of AGLDPSO with other large-scale optimization algorithms on the IEEE CEC2010 test functions with dimensionality increasing to 2000.
When dealing with the 2000-D problems, the MaxFEs is set as 6 000 000 for all competitors, while the population size N is set as 1000 in AGLDPSO.Moreover, the parameter c 2 which controls the convergence speed to the gbest is set as 0.2.The detailed experimental results can be seen in Table S.II in the supplementary file.It can be observed that as the dimension increases, the performance of many algorithms is greatly deteriorated, except AGLDPSO.Besides, MLCC may be more suitable for solving separable functions, while CSO performs relatively better on some partially separable functions.Even so, AGLDPSO still keeps it tremendous advantage and superiority on all functions.It performs better than CCPSO2, SL-PSO, CSO, DMS-L-PSO, DSPLSO, DLLSO, DECC-DG, DECC-G, and MLCC on 19,15,13,20,11,11,17,16, and 18 functions, respectively.Conversely, CCPSO2, SL-PSO, CSO, DSPLSO, DLLSO, DECC-DG, DECC-G, and MLCC can only surpass AGLDPSO on 1, 2, 7, 8, 8, 2, 4, and 2 functions, respectively.DMS-L-PSO cannot outperform AGLDPSO on any functions.These results fully demonstrate that AGLDPSO can also achieve good performance even the dimension increases to 2000.

F. Effects of AGLS
In this section, we will discuss the property and influence of AGLS, which can achieve the adaptive subpopulation size and control the learning granularity.Herein, to investigate the effectiveness of the new proposed AGLS, we compare AGLDPSO with three AGLDPSO variants with fixed subpopulation size and one AGLDPSO variant with random subpopulation size.We denote the AGLDPSO variant with fixed subpopulation size M = a as AGLDPSO(a) and denote the AGLDPSO variant with random subpopulation size as AGLDPSO(rand).The comparison results of AGLDPSO and its variants on the IEEE CEC2010 test functions are listed in Table IV.
As we can see, different subpopulation sizes are suitable for different problems.For instance, a large subpopulation size (coarse granularity) is appropriate for exploitation, performing well on unimodal functions f 4 , f 12 , and f 19 .While a small subpopulation size (fine-granularity) is suitable for diversity maintaining, performing well on multimodal functions f 2 , f 11 , and f 16 .However, the adaptive granularity control achieved by AGLS, without any prior information, still outperforms AGLDPSO (10), AGLDPSO (15), and AGLDPSO (20) on 11, 12, and 13 functions, respectively, while is worse than these three variants on only 5, 1, and 4 functions, respectively.Moreover, it performs better than AGLDPSO(rand) on 16 functions and only worse than AGLDPSO(rand) on 1 function.This may be due to that the AGLS can estimate the evolutionary state based on LSH and LR, which will further adaptively change the subpopulation size to meet the search requirement of the current evolutionary state.Therefore, benefited from the adaptive granularity control, we not only eliminate the sensitivity of this parameter but also find a potential balance between exploration and exploitation to obtain better performance on a broad range of functions.
Moreover, from Table IV, we can see that AGLDPSO generally consumes more CPU time (measured in second) than its variants with fixed subpopulation size or random subpopulation size.This may be due to that the adaptive subpopulation size M in AGLS involves hashing-based clustering and the LR, which are relatively time consuming.Although the AGLS induces some extra CPU time to AGLDPSO, it also helps AGLDPSO find the suitable subpopulation size to effectively control the learning granularity, which will further balance the exploration and exploitation, obtaining better results.
Next, we further count the number of different subpopulation size M during the evolutionary process in AGLDPSO.From the above experiments, we find that the large subpopulation size (coarse granularity) performs well on unimodal functions f 4 , f 12 , and f 19 , while the small subpopulation size (fine-granularity) is suitable for multimodal functions f 2 , f 11 , and f 16 .Thus, we take these six functions as the representative instances to further verify the effectiveness of adaptive granularity control.The statistical results of M in the evolutionary process on these six functions are shown in Fig. 7.
From Fig. 7, we can see that the small subpopulation size appears more frequently than the large subpopulation size in all the listed functions.This may be due to that when solving the large-scale optimization problems, more diversity is needed.Small subpopulation size M will result in large number of subpopulations, and many sbest can be learned to increase the diversity.As a result, small subpopulation size is more needed to keep population diversity when solving the largescale optimization problems.Moreover, Fig. 7 also shows that in multimodal functions f 2 , f 11 , and f 16 , the small subpopulation size M appears more frequently than that in unimodal functions f 4 , f 12 , and f 19 .This is due to that small M can improve the population diversity to avoid local optima and therefore, is more needed in the multimodal environment.On the contrary, the large subpopulation size M appears more frequently in unimodal functions f 4 , f 12 , and f 19 than that in multimodal functions f 2 , f 11 , and f 16 , because large M can further accelerate the convergence speed and improve the accuracy of solution in the unimodal environment.Therefore, we can say that our adaptive granularity control method in the AGLDPSO algorithm is effective and adaptive to the evolutionary states of different problems.

G. Influences of Parameters
The bucket number nb has an important influence on the effectiveness of AGLS.If nb is small, the bucket size will be large.Thus, gbest and gworst will be in the same bucket with a higher probability, and there is no difference between N gworst and N gbest .While if nb is large, the bucket size will be small and the population will be divided more dispersedly.Therefore, the number of particles in each bucket is relatively small and there is still no significant difference between N gworst and N gbest .Since the subpopulation size M is adaptively changed according to the difference between N gworst and N gbest , neither larger nor smaller nb is suitable for AGLS.
The nb is tested with four values, 0.05N, 0.1N (the one used in AGLDPSO), 0.15N, and 0.2N.The AGLDPSO variant with nb = λ × N is called AGLDPSO (λ).For example, the AGLDPSO variant with nb = 0.05N is called AGLDPSO(0.05).The comparison results of AGLDPSO and its variants with different nb values on the IEEE CEC2010 test suite are listed in Table S.III in the supplementary file.
As we can see, different nb values make nearly no difference in AGLDPSO on f 1 , f 4 , f 5 , f 15 , and f 20 .Larger nb value with 0.2N performs the best on f 7 , while smaller nb value with 0.05N performs the best on f 2 , f 8 , and f 10 .However, on the other functions, AGLDPSO with nb = 0.1N performs better than AGLDPSO(0.05),AGLDPSO(0.15),and AGLDPSO(0.2) on 10, 11, and 12 functions, respectively.Therefore, nb = 0.1N is the most suitable parameter for AGLDPSO to achieve a good performance in our testing.
Moreover, we also investigate the influence of population size N, which is a hyperparameter in AGLDPSO.Generally speaking, larger population size N can maintain population diversity and enhance the exploration ability of algorithm, while smaller population size N can save more FEs in every generation, so as to result in longer evolutionary generations to further improve the accuracy of solutions.
N is tested with five values, 100, 300, 500 (the one used in AGLDPSO), 600, and 1000.The AGLDPSO variant with N = a is called AGLDPSO(a).For example, the AGLDPSO variant with N = 100 is called AGLDPSO(100).The comparison results of AGLDPSO and its variants with different N values on the IEEE CEC2010 test suite are listed in Table S.IV in the supplementary file.
As we can see from Table S.IV in the supplementary file, AGLDPSO variants with larger population size perform generally better than the AGLDPSO variants with smaller population size.Especially, the AGLDPSO variant with the smallest population size, that is, N = 100, performs the worst in our test.This may be due to that when solving the largescale optimization problems, more diversity is needed.Larger population size N can increase the diversity, which is more suitable for large-scale optimization problems.Nevertheless, AGLDPSO variants with too large population size N may lead to large FEs consuming in every generation and therefore, makes the algorithm terminate early with fewer generations.This is not good for the algorithm to search for the global optimum sufficiently.For example, both AGLDPSO(600) and AGLDPSO(1000) perform worse than AGLDPSO on f 5 , f 9 , f 11 , f 12 , f 14 , f 15 , f 17 , and f 19 .All in all, AGLDPSO with N = 500 performs significantly better than AGLDPSO(100), AGLDPSO(300), AGLDPSO(600), and AGLDPSO(1000) on 20, 15, 13, and 11 functions, respectively, while is only significantly beaten by them on 0, 4, 2, and 7 functions, respectively.Therefore, N = 500 is the most suitable parameter for AGLDPSO to achieve a good performance in our test.

V. CONCLUSION
This article develops an algorithm called AGLDPSO with the help of ML techniques for large-scale optimization.Two major novel techniques are designed: 1) master-slave multisubpopulation distributed model and 2) AGLS.
Several subpopulations are co-evolved by using the masterslave multisubpopulation distributed model.Compared with other large-scale optimization algorithms with the single population evolution or centralized mechanism, the distributed multisubpopulation co-evolution mechanism will fully exchange the evolutionary information among different populations to further enhance the population diversity.Furthermore, the AGLS based on LSH and LR in AGLDPSO can find an appropriate subpopulation size on different evolutionary states, and further to relieve the sensitivity of parameters.Moreover, the adaptive subpopulation size can further control the learning granularity effectively and can find a potential balance between diversity and convergence.
Equipped with these two novel techniques, AGLDPSO achieves a promising and satisfying performance when solving the large-scale optimization problems.The comparison results between AGLDPSO and other large-scale optimization algorithms, even the winner of the competition on largescale optimization, fully show the efficiency and feasibility of AGLDPSO.
r d 1i and r d 2i are two uniformly distributed random numbers within [0, 1].A particle's velocity and position on each dimension are clamped in [−V d max , V d max ] and [X d min , X d max ], respectively.

Fig. 3 .
Fig. 3. Illustration of the evolutionary state based on the population distribution.(a) Exploration.(b) Exploitation.

Fig. 7 .
Fig. 7. Statistical results of subpopulation size M on different functions.

The best one of all the pbest i is treated as the globally best of the entire pop
ulation, called gbest = [gbest 1 , gbest 2 , . . ., gbest D ]

TABLE IV EXPERIMENTAL
RESULTS OF AGLDPSO AND ITS VARIANTS WITH DIFFERENT SUBPOPULATION SIZES ON 1000-D IEEE CEC2010 FUNCTIONS