Research on Large-Scale Bi-Level Particle Swarm Optimization Algorithm

Targeting at the slow convergence and the local optimum problems of particle swarm optimization (PSO), a large-scale bi-level particle swarm optimization algorithm is proposed in this paper, which enlarges the particle swarm scale and enhances the initial population diversity on the basis of multi-particle swarms. On the other hand, this algorithm also improves the running efficiency of the particle swarms by the structural advantages of bi-level particle swarms, for which, the upper-level particle swarm provides decision-making information while the lower level working particle swarms run at the same time, enhancing the operation efficiency of particle swarms. The two levels of particle swarms collaborate and work well with each other. In order to prevent population precocity and slow convergence in the later stage, an accelerated factor based on increasing exponential function is applied at the same time to control the coupling among particle swarms. And the simulation results show that the large-scale bi-level particle swarm optimization algorithm is featured in better superiority and stability.


I. INTRODUCTION
Particle Swarm Optimization (PSO) is an evolutionary computing technique [1], [2], which is derived from the study on birds predation behavior. Similar to the genetic algorithm, PSO is also an iteration-based optimization tool. It firstly initializes a set of random solutions in the system, takes each individual as the particle having no weight or volume in n-dimension space, and then searches the optimum value by iteration, so that the particles in the solution space could follow the optimum particle to search. Due to its advantages of rapid searching speed, good initial convergence, and easy realization, the PSO algorithm has been widely used in many fields [3]- [11]. However, it also has disadvantages, especially when solving complicated problems, such as bad diversity performance in later stage, reduced evolution speed, and unsatisfactory optimization accuracy, etc. Though PSO has little parameters to be adjusted, improper setting of parameters could make the algorithm trapped in problems like ''precocity'' and local optimum, etc.
The associate editor coordinating the review of this manuscript and approving it for publication was Seyedali Mirjalili . To solve these problems, many scholars have conducted researches in various aspects. Some tried to improve the algorithm by parameter setting, for which, the inertia weight [12]- [15] and the accelerated factor [16], [17] were mainly adjusted. That's because the inertial weight setting affects both the global and the local search capabilities of the algorithm. The appropriate inertial weight should be selected to balance the search capability and algorithm accuracy. On the other hand, the setting of learning factor determines the influences of individual historical optimal value and group historical optimal value on particle moving trajectory. Too large learning factor may make the particles jump out of the optimal area. But too small learning factor may cause particle oscillation in areas far away from the target. In addition, some other scholars tried to improve the optimization performance of the algorithm by changing population topology structure [18] because the change of topology structure can avoid population diversity loss and prevent the algorithm from being trapped in local optimum problem. However, there have no topology structure that is suitable for all benchmark functions. The selection of topology structure is actually related to specific problem model. Besides, the way to form mixed PSO [19] by combining with genetic algorithm [20]- [22], ant colony algorithm [23]- [25], or other optimization algorithms, can make up the PSO defects to some certain extent. But all these algorithm improvement are based on single-swarm, which though improve the algorithm performance to certain extent, still have their problems like precocity convergence and low solution accuracy. With the in-depth study on algorithm, some scholars started to study multi-swarms in order to improve the algorithm. Luo Dexiang divided a single particle swarm into three particle swarms, and made comparison on the optimal values obtained by the independent operation of the three sub-swarms in order to get the global optimum [26]. Although this algorithm prevent the local optimum problem to a certain extent, it sacrifices the accuracy. In literature [27], Lovbjerg et al. divided the population into multiple sub-populations. And in this paper, the optimal particles of the sub-populations was applied to replace the population optimal particles in the basic PSO algorithm speed update formula, which reduces the local optimum risk of algorithm. However, the only interaction among sub-populations is the parental reproduction among different sub-populations, and the information exchange between sub-populations is insufficient. Liu et al. [28] used the K-means clustering algorithm to divide the population into several sub-populations and strengthen the information exchange between particles by periodically reconstructing the sub-populations. In order to avoid local optimum, some scholars proposed a particle swarm optimizer with two differential mutation [29], and some others introduced the crossover operator to improve the performance of the bl-level particle swarm optimization [30]. Although this algorithm prevents the local optimum problem to some certain extent, it prolongs the running time of the algorithm.
Targeting at the above problems, we propose a new largescale bi-level particle swarm optimization algorithm, which can better balance the global search and local search ability of the algorithm and maintain good calculation accuracy while reducing the running time; in the bi-level particle swarm optimization structure, the learning factor strategy of exponential function distribution is proposed to guide the lower-level working particle swarm optimization, the upper-level particle swarm is used to optimize the evolution direction while the lower-level particle swarm is used to increase population diversity, and the upper and lower levels operate in coordination and work together; the scale of the particle swarm is enlarged and the lower level particle swarm operates in parallel, which improves the population diversity of the population in the initial stage, improves the accuracy of the algorithm and reduces the running time. Finally, we verify the effectiveness of the algorithm through comparative experiments.
The rest part of this paper is composed in structure as below: in Chapter 2, we briefly review the relative methods of basic particle swarm optimization (BPSO); in Chapter 3, we introduce the method proposed in this paper in detail; and in Chapter 4, we introduce our experiment in detail and the final experimental results; and finally, we provide a summary of our research work and a prospect for future work in Chapter 5.

II. THE BASIC PARTICLE SWARM OPTIMIZATION ALGORITHM THEORY
As for the basic particle swarm optimization it firstly initializes n random particles as a particle swarm X = (X 1 , X 2 , . . . , X n ), and each particle refers to one potential optimal solution, d refers the dimensions which equals the number of unknowns of to-be-solved problems. Suppose that the position of No.i particle in d-dimension search space is expressed as: particles have a corresponding fitness value that is determined by the optimized function, and the fitness value is used to judge the particle performance. During each iteration, each particle should be recorded for two positions: one is P i = [P i1 , P i2 , . . . , P id ] T , which is the best position that Particle X i has searched, and the other is G = [G 1 , G 2 , . . . , G d ] T , which is the best position that the particle swarm has searched. Each particle adjusts its updated position by tracking these two positions. And the PSO achieves optimum searching by such population circulated iteration. For the update formula, please formula (1) and (2).
In formula (1) and (2), ω is the inertia weight, which significantly affects the search capability of the algorithm. Larger ω is good for the global search while the smaller one is good for the local search; k refers to the current iteration times; c 1 and c 2 refer to the accelerated factors, which are used to adjust the step length of the particle moving towards the individual best position and the global best position. But too large or too small accelerated factors may make the particles far away from the optimum or result in particle oscillation. r 1 and r 2 are the random numbers distributed between [0,1]. The pre-determined maximum iteration times or the optimization result satisfying the accuracy requirement should be the condition for iteration termination of the basic particle swarm optimization. The global extreme value G at the time when the algorithm terminates is the final optimum. In order to improve the search efficiency and prevent particles from getting out of the search space, particle velocity at each dimension is restricted between [−V max , V max ]. And commonly, V max will not exceed the width of the particle. In the same way, the particle position at each dimension is also restricted between [−X max , X max ]. If there's particle leaving the solution space, this range would be applied to reupdate particle information.
The BPSO is not sensitive to environment changes, and can be easily affected by P i or G, so that it is difficult to converge to the global optimum. Especially in the complex multimodal functions in high-dimensional space, these functions have many local optimums, which can easily attract particle swarms, making the algorithm fall into the local optimum and precocity convergence. Experiments indicate that the ways to VOLUME 9, 2021 change the inertia weight ω or accelerated factors c 1 and c 2 have no significant effects in improving the efficiency and stability of the algorithm.

III. LARGE-SCALE BI-LEVEL PARTICLE SWARM OPTIMIZATION ALGORITHM
A. BI-LEVEL MULTI-PARTICLE SWARM Most optimization made to particle swarms previously were targeted at current swarm evolution. However, PSO for single swarm particles could fall into local optimum easily, and has disadvantages of low search efficiency and bad stability performance in later stage. Compared to the PSO for single swarm of particles, multi-particle swarm optimization algorithm is featured in a significant structural advantage: when a specific particle swarm falls into local optimum, other particle swarms can provide it with their particle information to help it get rid of local optimum. Information exchange among multiple particle swarms can greatly improve the search efficiency of particle swarms in later stage. On the other hand, it is generally believed that the larger the population size, the higher accuracy and the better stability of the PSO. In order to get the complicated problem solved faster, the particle swarm size can be enlarged to broaden the search range of the particle swarm and enhance the exploration capability of the population.
To balance the exploration and development capabilities of particle swarms, a large-scale bi-level PSO was proposed in this paper, which makes the particle swarms play a greater role with large scale, and overcome the local optimum and slow convergence of single-level particle swarm by introducing bi-level particle swarm structure. For the algorithm structural diagram, please see Figure 1 below. In Figure 1, the large-scale bi-level PSO algorithm is divided into upper and lower levels. The lower level particles, which contain rich information, are divided into small subpopulations. Formula (3) shows the velocity information of No.i particle in No.j sub-population of the lower level. Formula (4) shows the current position of No.i particle in No.j sub-population of the lower level. And Formula (5) shows the individual extreme value of No.j sub-population of the lower level.
Each lower-level working particle swarm can find an optimal solution of sub-population, which affects the movement of particles in the swarm to a certain extent. The lowerlevel particle swarm provides optimal particle information, and uses the mean value of the optimal particle information to seek the global optimum. And the upper-level decisionmaking particle swarm P s collects the optimal information of the lower-level working particle swarm P j (j is the label of the lower-level particle swarm), and processes the optimal information to generate decision information and feed it back to each lower-level working particle swarm. In formula (6), the population with size of N can be divided into m sub-populations. The size of sub-population P j : P j = floor (N/m), where floor is rounded down, and j = 1, 2, . . . , m. In formula (7), X xs is used as the decisionmaking information of the upper-level particle swarm to guide the evolution direction of the lower-level particle swarms, conduct the lower-level working particle swarms to explore the area that may possibly contain the optimum, and make the lower-level particles fly toward the optimal position. When a certain particle swarm in the lower-level falls into the local optimum, the upper decision-making particle swarm can help it get rid of the local optimum.
In the early stage of the iteration, the lower-level working particle swarms are distributed and scattered randomly. In order to enhance the exploration capability of the algorithm in the early stage, the guidance of the upper-level decision-making particle swarm to the lower-level working particle swarms should be loosely coupled. In the later stage when the lower-level particle swarms tend to the optimal solution and the lower-level working particle swarms have richer information, the way to strengthen the guidance effects of the upper-level decision-making particle swarms on the lower-level working particle swarms can better improve the exploration and development capabilities of the algorithm. In this case, it is proper to adopt the exponential distribution learning factor strategy.
The numerical value of a in formula (8) needs to be set according to the specific processing cases. Combining formulas (7) and (8), the iterative formulas evolve from formulas (1) and (2) to formulas (9) and (10).

B. ANALYSIS ON PARAMETERS OF POPULATION SIZE
As for the BPSO, the population size N is an important parameter, representing the number of particles in the particle swarm and playing a significant role in algorithm convergence speed, accuracy and stability. Generally, the number of particles in the research process is determined by the complexity of the problem. In this paper, N = 50, 200, 400, 800 were selected to analyze its effect on the algorithm performance. Figure 2 indicates the curve of fitness value changes with the increase of iteration times when N is taken with different numerical values.
In the experiment, the number of iterations was set as 500, and the population size N was changed. Then, 30 times of evolutionary operations were conducted on the testing functions respectively. Then, the effects of N on algorithm performance are as shown in Figure 2. It can be seen from Figure 2 that, with the increase of population size, the PSO evolutionary operation shows better effects: the greater the particle population size, the higher accuracy the search. But for small number of particles, the exploration on particle swarm seems inadequate. Especially for complicated and high-dimension problems, large scale population always shows more significant advantages.
But when the number of particle swarms reached 200, 400, and 800, though the mean optimal fitness value of the particle swarm was improved compared to cases with N below 100, the improvement effect was not so significant with the increase of particle number, and the optimal fitness values varied little. And in single particle swarm, the great the population size, the faster the convergence velocity at the initial stage of iteration. Moreover, during the aforesaid convergence process, it lost diversity easily and failed to well maintain the global convergence capability of the algorithm. With the enlargement of population size, the computation overhead increased exponentially, and the running time also increased. The PSO for single particle swarm showed no obvious advantages in evolutionary operation, and brought no more gains to the algorithm. However, these problems could be well solved in bi-level particle swarm optimization algorithm because it can exert its advantages of large scale population and structure to achieve balance of PSO between accuracy and stability improvement, and running time reduction.

C. PARALLEL RUNNING OF MULTI-PARTICLE SWARMS
For complex problem processing, the search space of multiparticle swarm evolutionary calculation enlarges sharply. And it usually costs a long time to operate on a single CPU due to the low operation efficiency. In this case, the parallel computation could be introduced among multiple particle swarms to effectively resolve the overlong computation time problem for large-scale computation by collaborating parallel work of several CPUs. The number of working particle swarms in lower-level should be firstly determined, and then the number of subthreads in the primary thread should be set according to the number of working particle swarms. After that, each thread shall be assigned with specific tasks. There are two ways of information exchanges: synchronous and asynchronous. Asynchronous information exchange refers to: when one thread accomplishes update of individual optimal position and velocity, it will provide the information to upper-level decision-making particle swarm immediately for judging the VOLUME 9, 2021 flying direction next time. And the synchronous information exchange refers to: particles of all threads accomplish update of position and velocity first, and then evaluate and update the global optimum individual position and velocity based on these information in order to make judgment on the flying direction next time. As for this paper, the parallel algorithm was adopted, which is essentially the simulation to the bird flock predatory behavior. It simulates the behavior that the bird leader divides the bird flock into multiple groups. After a time of search, each group informs the bird leader about the best position they find. And the leader judges the optimal position that is most likely to find food, and inform the bird flock to forge ahead toward the direction. Therefore, this algorithm adopts synchronous mode in parallel information exchange: when particles of all threads accomplish the updates, the upper-level decision-making particle swarm collects the optimum particle information of the working particle swarms of the lower-level, and then process information to generate decision-making information, and inform the particle swarms in lower-level. Therefore, besides the effects of its own optimal position and optimal particle position of lower-level working sub-populations, each particle of the swarm is also affected by the overall decision-making information of the whole group, thus to update the flying position information next time.
During the evolutionary operation process, the multiparticle swarm optimization algorithm divides the particle swarm into several sub-particle swarms, which are evenly distributed in the solution space to ensure the diversity of particles. Each particle swarm only needs to accomplish its own internal evolution. At the end of each iteration, particle swarms conduct information communication, so that when any sub-particle swarm falls into local optimum, other particle swarms will provide their position information to it. These information provides direction for the swarm trapped in local optimum, and helps it get rid of the local optimum problem. Please see  Large-scale bi-level particle swarm optimization algorithm can well solve the problem that the particle swarms fall into local optimum, and help those sub-particle swarms trapped in local optimum escape from the local optimum during the iterative process. The lower-level working particle swarms mainly have three types of status, first: all the lower-level working particle swarms tend to the global optimum. And at this time the upper-level decision-making particle swarm can provide better guidance by integrating the status of each lower-level working particle swarm; the second type: some lower-level working particle swarms fall into local optimum while the rest part tends to the global optimum, then the upper-level decision-making particle swarm will guide the lower-level working particle swarm through comprehensive information, directing the particle swarms trapped in the local optimum to the global optimum and helping them escape from local optimum; the third type: all lower-level working particle swarms tend to local optimum: suppose that the probability of lower-level working particle swarms falling into the local optimum is x%, then the probability of all lower-level working particle swarms falling into the local optimum would be (x%) m ( m refers to the number of working particle swarms in the lower level). Compared with a single particle swarm, the probability that all working particle swarms in a largescale bi-level particle swarms falling into a local optimum is greatly reduced.

D. ALGORITHM STEPS
Step 1: Construct a bi-level particle swarm structure.
(1) Divide the global particle swarm into several lowerlevel sub-swarms of working particles.
(1) Create an upper-level decision-making particle swarm space.
Step 2: Initialize the lower-level working particle swarms, and initialize parameters, particle velocity, position, and calculate the fitness value and individual extreme value of the particle.
Step 3: Extract the optimal particle information from each lower-level working particle swarm, and provide to the upperlevel decision-making particle swarm. Step 4: The upper-level decision-making particle swarm guides the evolutionary direction for the lower-level working particle swarms.
(1) The upper-level decision-making particle swarm calculates the decision-making information.
(2) Provide the decision information to the lower-level working particle swarms according to the learning factor distributed by the exponential function, and conduct the updates of velocity and position.
Step 5: The lower-level working particle swarms run in parallel and calculate the fitness values.
Step 6: Judge if the algorithm meets the termination condition (the algorithm has reached the required times of iterations), if it meets the requirements, go to step 7, otherwise go to step 3.
Step 7: Output the global optimum, and the algorithm ends.

IV. SIMULATION ANALYSIS A. PARAMETER SETTING
In this section, the large-scale bi-level particle swarm optimization method is evaluated from various aspects by an array of experiments conducted in benchmark functions. In order to get an unbiased comparison of CPU times, all the experiments were performed on a same PC, which was configured with detailed settings as shown in Table 1.
Eleven different benchmark functions were used to evaluate the large-scale bi-level particle swarm optimization algorithm that we proposed. The test functions are divided into two groups: the unimodal and the multi-modal. The unimodal functions (F1-F5) are suitable for benchmarking the exploitation of algorithms since they have one global optimum and no local optima. On the contrary, the multi-modal functions (F6-F11) have a massive number of local optima and are helpful for examining the exploration and local optima avoidance of algorithms. The expressions and properties of these benchmarks are presented in Table 2 and Table 3 respectively. Parameters of experiment test are as listed in Table 4.

B. COMPARISONS OF THE LARGE-SCALE BI-LEVEL PARTICLE SWARM OPTIMIZATION ALGORITHMS WITH OTHER METHODS
In terms of the benchmark problems, the performance of the large-scale bi-level particle swarm optimization algorithms was compared with six other optimization algorithms. The methods included in the comparative study are BPSO, adaptive particle swarm optimization(APSO) [31], elephant herding optimization(EHO) [32], moth-flame optimization algorithm (MFO) [33], monarch butterfly optimization (MBO) [34] and earthworm optimization algorithm (EWA) [35]. Tables 5 and 6 record the average and the best results of 30 tests respectively. Table 5 shows that, on average, the algorithm in this paper outperforms the effects of other methods on seven of the eleven benchmarks (F1, F2, F6-F8, F10 and F11) when searching for the minimum value of the function. EHO and MFO are the second most effective methods and show the best performance on eleven benchmarks (F5, F9, and F3, F4, respectively). It can be seen from Table 6 that, in five of the eleven benchmarks (F1, F2, F7, F10, and F11), the algorithm in this paper is better than other methods. EHO also shows better performance than other algorithms in the five benchmarks (F4-F6, F8, and F9), while the MFO ranking the third most efficient and showing the best performing on the benchmarks F3 when multiple runs are made. It can be seen that the algorithm in this paper can greatly improve the performance of the algorithm.
Furthermore, the optimizing processes of all algorithms are given in Figure 5-15. The values shown in these figures are VOLUME 9, 2021  the optimal function optima achieved from 30 runs. Here, all the values are true function values without being normalized. Figure 5 shows the value of F1 function obtained by seven methods. Value in the figure is the function value of the spherical function of F1, also known as De Jong's function with global value of F1 min = 0, so it is easy to solve. It is known from FIGURE 5 that, the large-scale bi-level particle swarm optimization algorithm has the fastest convergence rate towards the global solution, which is better than all other methods.     shows the optimal performance in this unimodal benchmark function. Figure 7 reveals the function values for F3 Schwefel 1.2 function. As for the unimodal function in Figure 7, MFO algorithm performs better than the other six algorithms, EHO   also shows a good convergence speed, and large-scale bi-level particle swarm optimization algorithm ranks the third.    while other algorithms perform poorly in this benchmark function. Figure 9 reveals the function values for F5 Rosenbrock function, showing that, the EHO has the fastest convergence speed and the best performance in this benchmark function. Figure 10 illustrates the values achieved for the seven methods when using F6. In the convergence graph of the optimal value, the EHO algorithm shows extremely fast convergence speed, and at the same time, the algorithm proposed in this paper also performs well in this benchmark function, providing good results and showing great advantages, but the MFO algorithm shows the slowest convergence speed. Figure 11 reveals the function value of the F7 Rastrigin function, which is a complex multimodal function with a unique global minimum of F7 min = 0 and several local optimals. For solving F07, the method may converge to a local value. Therefore, a method that can keep a larger diversity is more likely to produce better values. The algorithm proposed in this paper has the best performance. In addition, it can be seen from the figure that the MFO algorithm may be trapped in the local optimal value. Figure 12 shows the values obtained by the seven methods on F8 function, which is a multimodal function with a narrow global minimum basin (F8 min = 0) and many minor local optima. Although the algorithm in this paper has a slow convergence speed at the early stage, it performs well after 75 iterations, and the initial values of all the methods are almost the same. In the end, the EHO algorithm surpasses the other six methods, and EWA and MBO may fall into local optima.    Figure 13 displays the values for F9 function. The EHO algorithm performs the best in this benchmark function. As the iteration goes on, the convergence speed of the algorithm in this paper gradually surpasses other algorithms and finally ranks the third. Figure 14 reveals the values for F10 Penalty #1 function. It can be seen from FIGURE 14 that, the large-scale bi-level particle swarm optimization algorithm converges faster in global solution than other algorithms. Figure 15 shows the values achieved on F11 function. For this benchmark, the large-scale bi-level particle swarm optimization algorithm overtakes all other approaches in the optimization process, which is very similar to F10 in Figure 14.
As shown in Table 8, the running time of the algorithm in this paper reduces greatly compared with those in the other six algorithms, among which, the large-scale bi-level particle swarm optimization algorithm is the most efficient one.

C. COMPARISONS WITH OTHER OPTIMIZATION METHODS BY USING WILCOXON'S RANK-SUM TEST
Based on the final search results of 30 independent trials on every function, we figure out and present the key data in Table 9, which is the p-values of every function of the wilcoxon's rank-sum test with the 5% level of significance between the large-scale bi-level particle swarm optimization algorithm and other optimization methods. Although the large-scale bi-level particle swarm optimization algorithm provides no better results on the test functions (F3, F4, F5 and F9) in Table 5, the p-values in Table 9 show that the results of this algorithm are very competitive. VOLUME 9, 2021

D. ANALYSIS ON EXPERIMENTAL RESULTS
Compared with other algorithms, although the algorithm proposed in this paper has a slow convergence speed in the early stage, it can hardly fall into local optimum. The Mean fitness function value and the best fitness function value of the algorithm in this paper are significantly improved. The standard deviation also reflects the good stability of the algorithm. The structural advantage of the bi-level structure increases the diversity of particle swarm. The lower-level working particle swarms run in parallel, which greatly reduces the toolong running time caused by the large-scale particle swarms. Meanwhile, both the upper and lower levels coordinate to achieve rational running, and the upper-level controls the lower-level effectively by the exponentially distributed learning factor. The algorithm in this paper proves that the ways to enlarge the size of the particle swarm and improve the structure of the particle swarm can effectively avoid the precocity convergence problem and local optimum problem of the PSO in later stage. To sum up, the large-scale bi-level particle swarm optimization algorithm is featured in not only high convergence accuracy, but also good optimization effects and less running time.

V. CONCLUSION
A large-scale bi-level particle swarm optimization algorithm is proposed in this paper targeting at PSO problems of bad diversity in initial stage and local optimum in later stage. It takes use of the large-scale bi-level particle swarm design to improve the particle swarm diversity in initial stage and reduce the possibility of being trapped in local optimum in later stage, thereby improving the algorithm stability. Meanwhile, it also uses the exponentially distributed learning factor to control the particle swarm coupling, which improves the computational efficiency of the algorithm. The parallel operation of the lower-level particle swarms improves the operating efficiency of the algorithm and reduces the toolong running time problem caused by the complex structure.
The simulation experiments prove that the large-scale bi-level particle swarm optimization algorithm is featured in not only satisfactory optimization effects, but also improved stability.