System Maintenance:
There may be intermittent impact on performance while updates are in progress. We apologize for the inconvenience.
By Topic

IEEE Quick Preview
  • Abstract

SECTION I

INTRODUCTION

STOCHASTIC algorithms such as evolutionary algorithms (EAs) and particle swarm optimization (PSO) algorithms have been shown to be effective optimization techniques [1]. However, their performance often deteriorates rapidly as the dimensionality of the problem increases. Nevertheless, many real-world problems involve optimization of a large number of variables. For example, in shape optimization a large number of shape design variables is often used to represent complex shapes, such as turbine blades [2], aircraft wings [3], and heat exchangers [4]. Existing EAs are often ill-equipped in handling this class of problems. To meet such a demand, research into designing EAs that are able to tackle large-scale optimization problems has recently gained momentum [5], [6].

PSO is notorious for being prone to premature convergence. For example, in a comparative study where several widely used stochastic algorithms such as differential evolution (DE), EA, and PSO were evaluated [7], PSO was shown to perform very poorly, as the dimensionality of a problem increases. This perception of PSO's inability to handle high-dimensional problems seems to be widely held [7], [8].

A natural approach to tackle high-dimensional optimization problems is to adopt a divide-and-conquer strategy. An early work on a cooperative coevolutionary algorithm (CCEA) by Potter and Jong [9] provides a promising approach for decomposing a high-dimensional problem, and tackling its subcomponents individually. By cooperatively coevolving multiple EA subpopulations (each dealing with a subproblem of a lower dimensionality), we can obtain an overall solution derived from combinations of subsolutions, which are evolved from individual subpopulations. Clearly, the effectiveness of such CCEAs depends heavily on the decomposition strategies used. Classical CCEAs [9] performed poorly on nonseparable problems, because the interdependencies among different variables could not be captured well enough by the algorithms. Generally speaking, existing CCEAs still performed poorly on nonseparable problems with 100 or more real-valued variables [10].

An early attempt to apply Potter's CC model to PSO was made by Van den Bergh and Engelbrecht [8], where two cooperative PSO models, Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$, were developed. However, these two models were only tested on functions of up to 30 dimensions [8]and 190 dimensions [11]. The question remains on how well these CCPSO models scale with problems of significantly larger dimensions.

Recent studies by Yang et al. [10], [12]suggest a new decomposition strategy based on random grouping. Without prior knowledge of the nonseparability of a problem, it was shown that random grouping increases the probability of two interacting variables being allocated to the same subcomponent, thereby making it possible to optimize these interacting variables in the same subcomponent, rather than across different subcomponents. An adaptive weighting scheme was also introduced to further fine-tune the solutions produced by the CCEA [10]. Inspired by these recent works, a CCPSO integrating the random grouping and adaptive weighting schemes was developed, and demonstrated great promise in scaling up PSO on high-dimensional nonseparable problems [13]. This CCPSO outperformed the previously proposed Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$ [8] on 30-D separable and nonseparable functions. CCPSO was also shown to perform reasonably well on these functions of up to 1000 dimensions. Nevertheless, our latest paper on the CCEAs [14] revealed that it is actually more beneficial to apply random grouping more frequently than using the adaptive weighting scheme.

Building on our preliminary work on CCPSO [13] and our new findings on random grouping [14], this paper aims to demonstrate convincingly that a CC approach is an effective divide-and-conquer strategy and can be utilized to help scaling up PSO's performance for solving problems with a large number of variables. The proposed CCPSO2 described in this paper enhances the previously proposed CCPSO substantially. CCPSO2 differs from CCPSO in the following aspects.

  1. A new PSO model using Cauchy and Gaussian distributions for sampling around the personal best and the neighborhood best (respectively), is proposed. This PSO model using an Formula$lbest$ ring topology shows an improved search capability compared with existing PSO models (see Section VI-A).
  2. In the context of a CC framework, an Formula$lbest$ ring topology is used to define a local neighbourhood instead of the standard Formula$gbest$ model, to improve performance, especially on multimodal optimization problems.
  3. A new strategy for updating personal bests and the global best in the context of a CC framework is developed (see Section IV-D).
  4. The adaptive weighting scheme is removed, since our recent work [14] shows that it is more cost effective to apply random grouping more frequently instead.
  5. A new adaptive scheme is used to dynamically determine the subcomponent sizes for random grouping during a run, hence removing the need to specify this parameter.
  6. A comprehensive study comparing CCPSO2 with another state-of-the-art global optimization algorithm sep-CMA-ES on high-dimensional functions is provided. In addition, comparisons were made with two existing PSO algorithms and a CC DE which were specially designed for handling large-scale optimization problems.
  7. Results on functions of up to 2000 dimensions.

Three major benefits of using CCPSO2 include: CCPSO2 employs the Cauchy and Gaussian-based update rules in conjunction with an Formula$lbest$ topology hence its search capability is enhanced; CCPSO2 is more reliable and robust to use—a user does not have to specify the subcomponent size, since it is adaptively chosen from a set. Furthermore, our results show that CCPSO2 performs significantly better than the state-of-the-art algorithm sep-CMA-ES and two existing PSO models on complex multimodal functions of up to 2000 dimensions.

The rest of this paper is organized as follows. Section II presents an overview of existing CCEAs, specifically, those tested for handling nonseparable or high-dimensional problems. Random grouping as a novel decomposition method for CCEAs is also described. Section III provides the rationale on why PSO is an appropriate choice for constructing a CC model, as well as a previous study on CCPSO models, upon which our newly proposed CCPSO2 is built. Section IV introduces our new CCPSO2 algorithm, including the Cauchy and Gaussian PSO (CGPSO) adopted and a new scheme that allows dynamically changing subcomponent sizes during a run. Section V describes the experimental setup, followed by Section VI presenting experimental results and analysis. Finally, Section VII gives the concluding remarks.

SECTION II

COOPERATIVE COEVOLUTION

A. Early Work

The first CCEA for function optimization, called CCGA, was proposed by Potter and Jong [9], where the algorithm was empirically evaluated on six test functions of up to 30 dimensions. However, no attempt was made in using the cooperative coevolution (CC) framework on higher dimensional problems. More recently, the idea of using CC in optimization has attracted much attention and was incorporated into several algorithms, including evolutionary programming [15], evolution strategies [16], PSO [8], and DE [10], [12], [17].

In their original CCGA, Potter and Jong [9] decomposed a problem into several smaller subcomponents, each evolved by a separate GA subpopulation. As each subpopulation is evolved, the remaining subpopulations are held fixed. The subpopulations are evolved in a round-robin fashion. For a function optimization problem of Formula$n$ variables, Potter and Jong [9] decomposed the problem into Formula$n$ subcomponents, corresponding to Formula$n$ subpopulations (one for each variable). The fitness of a subpopulation member is determined by the Formula$n$-dimensional vector formed by this member and selected members from other subpopulations. In a way, the fitness of a subpopulation member is assessed by how well it “cooperates” with other subpopulations. Two models of cooperation were examined. In the first model CCGA-1, the fitness of a subpopulation member is computed by combining it with the current best members of other subpopulations. It was found that CCGA-1 performed significantly better than a conventional GA on separable problems, but much worse on nonseparable problems. To improve CCGA's performance on nonseparable problems, CCGA-2 was proposed where members were randomly selected from other subpopulations in the fitness evaluation. On a 2-D Rosenbrock function, CCGA-2 was shown to perform better than CCGA-1. In summary, Potter and Jong's original study [9] demonstrated the efficacy of the CC framework applied to function optimization. However, the CCGA framework was tested only on problems of up to 30 dimensions.

Liu et al. [15] applied the CC framework to their fast evolutionary programming (FEP) algorithms. The new algorithm FEP with CC (FEPCC) was able to optimize benchmark functions with 100 to 1000 real-valued variables. However, for one of the nonseparable functions, FEPCC performed poorly and was trapped in a local optimum, confirming the deficiency of handling variable interactions in Potter and Jong's decomposition strategy [9].

Van den Bergh and Engelbrecht [8] first introduced the CC framework to PSO.Two cooperative PSO algorithms, Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$, were developed. Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ adopts the same framework as that of Potter's CCGA, except that it allows a vector to be split into Formula$K$ subcomponents, instead of each subcomponent consisting of a single dimension. Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$ is a hybrid approach combining both a standard PSO with the Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$. These two CPSO algorithms were tested on some benchmark problems of up to 30 dimensions (and 190 dimensions in [11]). Some rotated test functions with variable interactions were also used. Their results demonstrated that correlation among variables in such problems reduces the effectiveness of the two CPSO algorithms. However, no new decomposition strategies were proposed for handling high dimensional nonseparable problems. A similar cooperative approach was also adopted in [18] but was implemented for bacterial foraging optimization.

New decomposition strategies were proposed and investigated for DE with CC [10], [12], [17], [19]. A splitting-in-half strategy was proposed by Shi et al. [17], which decomposed the search space into two subcomponents, each evolved by a separate subpopulation. Clearly, this strategy does not scale up very well and loses its effectiveness quickly when the number of dimensions becomes very large. Yang et al. [10], [12] proposed a decomposition strategy based on random grouping of variables, and applied it to a CC DE, on high-dimensional nonseparable problems with up to 1000 real-valued variables. The proposed algorithms, DECC-G [10] and subsequently MLCC [19], outperformed several existing algorithms significantly. This random grouping strategy represents an important step forward in handling nonseparable high-dimensional problems, and will be incorporated into our proposed CCPSO algorithm. We will describe this random grouping technique in detail in Section III.

B. Random Grouping of Variables

In Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ [8], the Formula$n$-dimensional search space is decomposed into Formula$K$ subcomponents, each corresponding to a swarm of Formula$s$-dimensions (where Formula$n=K\ast s$). However, the Formula$s$ variables in any given swarm remain in the same swarm over the course of optimization. Since it is not always known in advance how these Formula$K$ subcomponents are related for any given problem, it is likely that such a static grouping method places some interacting variables into different subcomponents. Because CCEAs work better if interacting variables are placed within the same subcomponent, instead of across different subcomponents, this static grouping method is likely to encounter difficulty in dealing with nonseparable problems.

One method to alleviate this problem is to dynamically change the grouping structure [10]. We call this method random grouping, which is the simplest dynamic grouping method and does not assume any prior knowledge of the problem to be optimized. Here, if we randomly decompose the Formula$n$-dimensional object vector into Formula$K$ subcomponents at each iteration, i.e., we construct each of the Formula$K$ subcomponents by randomly selecting Formula$s$-dimensions from the Formula$n$-dimensional object vector, the probability of placing two interacting variables into the same subcomponent becomes higher, over an increasing number of iterations. For example, for a problem of 1000 dimensions, if Formula$K=10$ (hence we know Formula$s=n/K=100$), the probability of placing two variables into the same subcomponent in one iteration is Formula$p={{{10}\over{1}}\over{10}}=0.1$. If we run the algorithm for 50 iterations, 50 executions of random groupings will occur. The probability of optimizing the two variables in the same subcomponent for at least one iteration follows a binomial probability distribution, and can be computed as follows: Formula TeX Source $$\eqalignno{P(x\geq 1)=&\, p(1)+p(2)+\cdots+p(50)\cr=&\, 1-p(0)\cr=&\, 1-\left (\matrix{{50}\cr{0}}\right)(0.1)^{0}(1-0.1)^{50}\cr=&\, 0.9948}$$ where Formula$x$ denotes the number of observed “successes” of placing two variables in the same subcomponent over the 50 trials; Formula$p(1)$ denotes the probability of having one such “success” over 50 iterations, and similarly Formula$p(2)$ being the probability of having two such “successes,” and so on. This suggests the random grouping strategy should help when there are some variable interactions present in a problem. Our recent paper [14] further generalizes the above probability calculation to cases where more than two interacting variables are present.

SECTION III

PARTICLE SWARM OPTIMIZATION

PSO is modeled on an abstract framework of “collective intelligence” in social animals [1], [20]. In PSO, individual particles of a swarm represent potential solutions, which “fly” through the problem search space seeking the optimal solution. These particles broadcast their current positions to neighboring particles. Previously identified “good positions” are then used by the swarm as a starting point for further search, where individual particles adjust their current positions and velocities.

A distinct characteristic of PSO is its fast convergent behavior and inherent adaptability, especially when compared to conventional EAs. Theoretical analysis of PSO [20] has shown that particles in a swarm can switch between an exploratory (with large search step sizes) and an exploitative (with smaller search step sizes) mode, responding adaptively to the shape of the fitness landscape. This characteristic makes PSO an ideal candidate to be incorporated into the CC framework for handling problems of high complexity and dimensionality.

In a canonical PSO, the velocity of each particle is modified iteratively by its personal best position (i.e., the position giving the best fitness value so far) and the global best position (i.e., the position of the best-fit particle from the entire swarm). As a result, each particle searches around a region defined by its personal best position and global best position. Let Formula${\bf v}_{i}$ denote the velocity of the Formula$i$th particle in the swarm, Formula${\bf x}_{i}$ its position, Formula${\bf y}_{i}$ its personal best position, and Formula${\mathhat{\bf y}}$ the global best position from the entire swarm. Each Formula$d$th dimension of Formula${\bf v}_{i}$ and Formula${\bf x}_{i}$ of the Formula$i$th particle in the swarm are updated according to the two equations [20] as follows: Formula TeX Source $$\eqalignno{v_{i,d}(t+1)=&\,\chi (v_{i,d}(t)+c1r1_{i,d}(t)(y_{i,d}(t)-x_{i,d}(t))+\cr& c2 r2_{i,d}(t)({\mathhat{y}}_{d}(t)-x_{i,d}(t))) &{\hbox{(1)}}\cr x_{i,d}(t+1)=&\, x_{i,d}(t)+v_{i,d}(t+1) &{\hbox{(2)}}}$$ for all Formula$i\in\{1,\ldots,swarmSize\}$ and Formula$d\in\{1,\ldots,n\}$ (where Formula$swarmSize$ is the population size of the swarm and Formula$n$ is the number of dimensions). Formula$c1$ and Formula$c2$ are acceleration coefficients. Formula$r1_{i,d}$ and Formula$r2_{i,d}$ are two random values independently and uniformly generated from the range Formula$[0, 1]$. A constriction coefficient Formula$\chi$ is used to prevent each particle from exploring too far away in the search space, since Formula$\chi$ applies a dampening effect to the oscillation size of a particle over time. This “Type 1” constricted PSO suggested by Clerc and Kennedy is often used with Formula$\chi$ set to 0.7298, and Formula$c1$ and Formula$c2$ set to 2.05 [20].

A. Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$

Van den Bergh and Engelbrecht [8]developed two cooperative PSO algorithms. In the first CPSO variant, Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$, they adopted the original decomposition strategy from Potter and Jong [9], but allowing a vector to be split into Formula$K$ subcomponents, each corresponding to a swarm of Formula$s$-dimensions (where Formula$n=K\ast s$). Algorithm 1 illustrates Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ [8]. In order to evaluate the fitness of a particle in a swarm, a context vector Formula${\mathhat{\bf y}}$ is constructed, which is a concatenation of all global best particles from all Formula$K$ swarms (as shown in Fig. 1). The evaluation of the Formula$i$th particle in the Formula$j$th swarm is done by calling the function Formula${\bf b}(j,P_{j}.{\bf x}_{i})$ which returns an Formula$n$-dimensional vector consisting of Formula${\mathhat{\bf y}}$ with its Formula$j$th component replaced by Formula$P_{j}\cdot{\bf x}_{i}$. The idea is to evaluate how well Formula$P_{j}\cdot{\bf x}_{i}$ “cooperates” with the best individuals from all other swarms.

Algorithm 1
Figure 1
Fig. 1. Concatenation of Formula$P_{1}\cdot{\mathhat{\bf y}},P_{2}\cdot{\mathhat{\bf y}},\ldots, P_{K}\cdot{\mathhat{\bf y}}$ constitutes Formula${\mathhat{\bf y}}$.

Note that if Formula$K$ equals Formula$n$, Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ operates the same way as Potter's CCGA-1, where Formula$n$ subpopulations of 1-D vectors are coevolved.

In their second variant, Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$, both Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and a standard PSO are used in an alternating manner, with Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ executed for one iteration, followed by the standard PSO in the next iteration. Information exchange between Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and the standard PSO was allowed so that the best solution found so far can be shared. To be specific, after an iteration of Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$, the context vector Formula${\mathhat{\bf y}}$ is used to replace a randomly chosen particle in the standard PSO. This is followed by one iteration of standard PSO, which may yield a new global best solution. This new best solution can be then used to update the subvectors of a randomly chosen particle from Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$.

Both Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$ were tested on functions of up to 30 dimensions [8], however, it is unclear how well the performances of Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ and Formula${\rm CPSO}\hbox{-}{\rm H}_{K}$ scale with functions of higher dimensions. In our preliminary study of cooperatively coevolving PSO [13], the previously proposed CCPSO, which employed both random grouping and adaptive weighting schemes, not only outperformed Formula${\rm CPSO}\hbox{-}{\rm S}_{K}$ on several 30-D functions, but also showed promising performance on these functions of up to 1000 dimensions.

SECTION IV

NEW CCPSO2 ALGORITHM

This paper proposes CCPSO2, which builds upon the previously proposed CCPSO [13]. CCPSO2 incorporates several new schemes to improve the performance and reliability of CCPSO. First, we adopt a PSO that does not use the velocity term, but instead, employ Cauchy and Gaussian distributions to generate the next particle positions. Second, we use an Formula$lbest$ ring topology to define the local neighborhood for each particle in order to slow down convergence and maintain better population diversity. Third, instead of using a fixed group (or subcomponent) size throughout a run for the random grouping mechanism, a different group size can be randomly chosen from a set at each iteration. There is clear evidence that the application of random grouping together with adaptively choosing subcomponent sizes contributed to the marked improvements of CCEAs [14], [19].

In Sections VVII, we will first review several studies which we have drawn upon to propose the new Formula$lbest$ PSO model using Cauchy and Gaussian distributions for sampling, then describe a simple scheme for dynamically changing the group size when random grouping is applied, and finally how these are put together to form the new CCPSO2 algorithm.

A. Related Studies

1) Gaussian-Based PSO

Two most commonly used PSO variants are probably the inertia weight PSO and constricted PSO. An early study [20] suggested that these two are equivalent to each other. Other studies suggested that a Gaussian distribution could be used in the PSO position update rule. For example Kennedy proposed a Bare-bones PSO [21], where each dimension of the new position of a particle is randomly selected from a Gaussian distribution with the mean being the average of Formula$y_{i,d}(t)$ and Formula${\mathhat{y}}_{d}(t)$ and the standard deviation Formula$\sigma$ being the distance between Formula$y_{i,d}(t)$ and Formula${\mathhat{y}}_{d}(t)$ as follows: Formula TeX Source $$x_{i,d}(t+1)={\cal N}({{y_{i,d}(t)+{\mathhat{y}}_{d}(t)}\over{2}}\vert y_{i,d}(t)-{\mathhat{y}}_{d}(t)\vert).\eqno{\hbox{(3)}}$$

Note that there is no velocity term used in (3). The new particle position is simply generated via the Gaussian distribution. A comparative study on PSO variants employing Gaussian distribution was provided in [22], including a Lévy distribution which is a more generalized form of distribution than the Gaussian and Cauchy distributions.1 Algorithms employing Lévy or Cauchy distributions, which both have a long fat tail, are more capable of escaping from local optima than the Gaussian counterpart, as suggested in several studies [22], [23], [24].

Without the velocity term, the Gaussian-based PSO (GPSO), as shown in (3), becomes similar to evolutionary programming (EP), where a Gaussian distribution is typically used for generating the next trial points in the search space [23]. One important difference though, is that in the Gaussian based PSO, the standard deviation Formula$\sigma$ is determined by the distance between Formula$y_{i,d}$ and Formula${\mathhat{y}}_{d}$, whereas in a typical EP, Formula$\sigma$ needs to be supplied by the user, or be made self-adaptive [23], [24].

In another GPSO proposed by Secrest and Lamont [25], instead of sampling around the midpoint between Formula${\bf y}_{i}$ and Formula${\mathhat{\bf y}}$, a Gaussian distribution is used to sample around Formula${\mathhat{\bf y}}$ with some prespecified probability Formula$p$, otherwise around Formula${\bf y}_{i}$. This proves to be beneficial, as particles can explore better in a much wider area, rather than just around the midpoint between Formula${\bf y}_{i}$ and Formula${\mathhat{\bf y}}$. However, since GPSO uses only a Gaussian distribution, its ability to explore the search space is rather limited, especially when the standard deviation Formula$\sigma$ becomes very small. In Section VI-A our experiments will demonstrate that using a combination of Cauchy and Gaussian distributions is superior to using only a Gaussian distribution for sampling, as also explained in Section V.

2) Formula$lbest$ PSO

One of the earliest PSO algorithms proposed was in fact an Formula$lbest$ PSO based on the ring topology [26]. A later study on PSO using a variety of neighbourhood topologies [27] showed that the Formula$lbest$ PSO based on the ring topology provided a slower convergence speed than the more widely used Formula$gbest$ PSO model. It is this slow convergence property that allows the Formula$lbest$ PSO to outperform the Formula$gbest$ model on a wide range of multimodal functions, though on unimodal functions, the Formula$gbest$ model is still likely to be the winner [28].

B. Cauchy and Gaussian-Based PSO

To solve large-scale optimization problems, an optimization algorithm needs to maintain its ability to explore effectively as well as to converge. Drawn from the findings of the aforementioned PSO variants, we propose a PSO model that employs both Cauchy and Gaussian distributions for sampling, as well as an Formula$lbest$ ring topology. The update rule for each particle position is rewritten as follows: Formula TeX Source $$x_{i,d}(t+1)=\cases{y_{i,d}(t)+{\cal C}(1)\vert y_{i,d}(t)-{\mathhat{y}}^{\prime}_{i,d}(t)\vert, &if $rand \leq p$\cr{\mathhat{y}}^{\prime}_{i,d}(t)+{\cal N}(0,1)\vert y_{i,d}(t)-{\mathhat{y}}^{\prime}_{i,d}(t)\vert,&{\rm otherwise}}\eqno{\hbox{(4)}}$$ where Formula${\cal C}(1)$ denotes a number that is generated following a Cauchy distribution, and in this case we also need to set an “effective standard deviation” [22] for the Cauchy distribution, which is the same standard deviation value we would set for the equivalent Gaussian distribution, Formula$\vert y_{i,d}(t)-{\mathhat{y}}^{\prime}_{i,d}(t)\vert$; Formula$rand$ is a random number generated uniformly from Formula$[0, 1]$; Formula$p$ is a user-specified probability value for Cauchy sampling to occur. Here, Formula${\mathhat{{\bf y}_{i}}}^{\prime}$ denotes a local neighborhood best for the Formula$i$th particle. Since an Formula$lbest$ ring topology is used for defining local neighborhood, Formula${\mathhat{{\bf y}_{i}}}^{\prime}$ (i.e., the best-fit particle) is chosen among all three particles including the current Formula$i$th particle and its immediate left and right neighbors (imagine that all particles are stored on a list that is indexed and wrapped-around). Since each particle may have a different Formula${\mathhat{{\bf y}_{i}}}^{\prime}$, the population is likely to remain diverse for a longer period. The chance of prematurely converging to a single global best Formula${\mathhat{\bf y}}$, as in a GPSO (3), should be reduced. Note that Formula$p$ can be simply set to 0.5 so that half of the time Cauchy is used to sample around the Formula$i$th particle's personal best Formula${\bf y}_{i}$ (more exploratory), while for the other half of the time Gaussian is used to sample around its neighborhood best Formula${\mathhat{{\bf y}_{i}}}^{\prime}$ (less exploratory).

For the rest of the paper, this CGPSO is used as the subcomponent optimizer for each swarm in the context of a CC framework.

C. Dynamically Changing Group Size

When applying random grouping to CCEA, a group size Formula$s$ (i.e., the number of variables in each subcomponent) has to be chosen. Obviously, the choice of value for Formula$s$ will have a significant impact on the performance of CCPSO2. Furthermore, it might be desirable to vary Formula$s$ during a run. For example, it is possible to start with a smaller Formula$s$ value and then gradually increase this value over the course of the algorithm, in order to encourage convergence to a final global solution and to consider more potential variable interactions.

This issue was studied in a recently developed multilevel cooperative coevolution (MLCC) [19], where a scheme was proposed to probabilistically choose a group size from a set of potential group sizes. At the end of each coevolutionary cycle, a performance record list is used to update the probability values so that the group sizes associated with higher performances are rewarded with higher probability values accordingly. As a result, these more “successful” group sizes are more likely to be used again in future cycles.

This paper adopts an even simpler approach. At each iteration, we record the fitness value of the global best Formula${\mathhat{\bf y}}$ before and after a coevolutionary cycle,2 and if there is no improvement of the fitness value, a new Formula$s$ is chosen uniformly at random from a set Formula${\bf S}$, otherwise, the value of Formula$s$ remains unchanged. Here Formula${\bf S}$ contains several possible Formula$s$ values ranging from small to large, e.g., Formula${\bf S}=\{2, 5, 50, 100, 200\}$. The idea is to continue to use the same Formula$s$ value as long as it works well, but if it does not, then a different value for Formula$s$ should be chosen. This simple scheme is demonstrated to work well experimentally (see Section VI-B). Although the set Formula${\bf S}$ still needs to be supplied, specification for the parameter Formula$s$ is no longer required. Formula$s$ values in Formula${\bf S}$ is applicable to any function of dimensions up to the largest value in Formula${\bf S}$.

One important difference between CCPSO2 and MLCC [19] is that the frequency of applying the random grouping method in CCPSO2 is likely to be much greater than that of MLCC. In CCPSO2 only one evolutionary step is executed over each subpopulation in a cycle, whereas in MLCC, it is common that several evolutionary steps are applied to each subpopulation. Given a fixed number of evaluations, random grouping is likely to be applied more frequently in CCPSO2 than MLCC. A higher frequency of applying random grouping should provide more benefit on nonseparable high dimensional problems, since it is more likely that interacting variables be captured in the same subcomponent of a CCEA [14].

D. CCPSO2

1) Basic Algorithm

Algorithm 2 summarizes the proposed CCPSO2 employing random grouping with dynamically changing group size Formula$s$, as well as the Cauchy and Gaussian update rule (as given in (4) for a ring topology-based Formula$lbest$ PSO. Two nested loops are used to iterate through each swarm and each particle in that swarm. In the first nested loop, for the Formula$i$th particle in the Formula$j$th swarm, its personal best Formula$P_{j}\cdot{\bf x}_{i}$ is first checked for update, and similarly for the Formula$j$th swarm best Formula$P_{j}\cdot{\mathhat{\bf y}}$. The function Formula$localBest(\cdot)$ returns Formula$P_{j}\cdot{\mathhat{{\bf y}_{i}}}^{\prime}$, which is the best-fit particle in the local neighborhood of the Formula$i$th particle, of the Formula$j$th swarm. The Formula$j$th swarm best Formula$P_{j}\cdot{\mathhat{\bf y}}$ is used to update the context vector Formula${\mathhat{\bf y}}$ if it is better. In the second nested loop, each particle's personal best, neighborhood best, and its corresponding swarm best are used to calculate the particle's next position using (4).

Algorithm 2

2) Updating Personal Bests

In order to update personal bests effectively in a coherent fashion over iterations, a new scheme is proposed here. Two matrices Formula${\bf X}$ and Formula${\bf Y}$ are used to store all information of particles' current positions and personal bests in all Formula$K$ swarms. For example, the Formula$i$th row of Formula${\bf X}$ is a Formula$n$-dimensional vector concatenating all current position vectors Formula${\bf x}_{i}$ from all Formula$K$ swarms (note that Formula$sw$ denotes Formula$swarmSize$) as follows: Formula TeX Source $${\bf X}=\left [\matrix{x_{1,1}& x_{1,2}&\ldots & x_{1,n}\cr x_{2,1}& x_{2,2}&\ldots& x_{2,n}\cr\vdots &\vdots &\ddots &\vdots\cr x_{{sw},1}& x_{{sw},2}&\ldots& x_{{sw},n}}\right]\eqno{\hbox{(5)}}$$ and Formula TeX Source $$\quad\quad{\bf Y}=\left [\matrix{y_{1,1}& y_{1,2}&\ldots & y_{1,n}\cr y_{2,1}& y_{2,2}&\ldots& y_{2,n}\cr\vdots &\vdots &\ddots &\vdots\cr y_{{sw},1}& y_{{sw},2}&\ldots& y_{{sw},n}}\right].\eqno{\hbox{(6)}}$$

When random grouping is applied, the indices of all columns are randomly permutated. These permutated indices are then used to construct Formula$K$ swarms over Formula$n$ dimensions, from Formula${\bf X}$ and Formula${\bf Y}$. This is achieved by simply taking out every Formula$s$ columns (or dimensions) of Formula${\bf X}$ and Formula${\bf Y}$ to form a new swarm. In other words, in each swarm a particle's position and personal best vectors are constructed according to the new permutated dimension indices. As a result of random grouping, the personal best of each newly formed particle in a swarm needs to be re-evaluated in order to obtain its correct “coevolving” fitness. This is a key difference from the conventional personal best update. All the experiments in later sections take these evaluations into account.

Similarly, Formula${\mathhat{\bf y}}$ should also be reconstructed according to the permutated dimension indices. This way, when forming new swarms based on the new permutated indices, the better position values found so far in certain dimensions can be used meaningfully to guide the search in future iterations.

Note that the original dimension indices are recorded so that when evaluating the Formula$i$th particle of the Formula$j$th swarm via function Formula${\bf b}(j,P_{j}\cdot{\bf x}_{i})$(which returns an Formula$n$-dimensional vector consisting of Formula${\mathhat{\bf y}}$ with its Formula$j$th component replaced by Formula$P_{j}\cdot{\bf x}_{i}$), the order of the original dimension indices of this Formula$n$-dimensional vector can be always restored before invoking an evaluation function.

After evaluations of all particles in the Formula$j$th swarm, if the swarm best Formula$P_{j}\cdot{\mathhat{\bf y}}$ is found to be better than the context vector Formula${\mathhat{\bf y}}$, then the corresponding part of Formula${\mathhat{\bf y}}$ gets replaced by Formula$P_{j}\cdot{\mathhat{\bf y}}$. This will ensure good information is recorded in Formula${\mathhat{\bf y}}$, and utilized in future iterations.

At the end of an iteration, after applying (4), the new current and personal best vectors are saved back to Formula${\bf X}$ and Formula${\bf Y}$. This will ensure that the next iteration can use the updated Formula${\bf X}$ and Formula${\bf Y}$ to start the process of applying random grouping again.

SECTION V

EXPERIMENTAL STUDIES

This section first describes test functions and performance measurements that are adopted, and then describes sep-CMA-ES, a state-of-the-art evolutionary strategy algorithm used in our comparative studies.

A. Experimental Setup

We adopt the seven benchmark test functions proposed for the CEC'08 special session on large-scale global optimization (LSGO) [5]. For each of these functions the global optimum is shifted with a different value in each dimension. While Formula$f_{1}$ (ShiftedSphere), Formula$f_{4}$ (ShiftedRastrigin), and Formula$f_{6}$ (ShiftedAckley) are separable functions, Formula$f_{2}$(SchwefelProblem), Formula$f_{3}$(ShiftedRosenbrock), Formula$f_{5}$ (ShiftedGriewank), and Formula$f_{7}$ (FastFractal) are nonseparable, presenting a greater challenge to any algorithm that is prone to variable interactions. Note that Formula$f_{5}$ (ShiftedGriewank) becomes easier to optimize as the number of dimensions increases, because the product component of Formula$f_{5}$ becomes increasingly insignificant [29], making it more like a separable function (as variable interactions are almost negligible). In addition to the above seven CEC'08 functions, we also include four more functions, Formula$f_{3r}$, Formula$f_{4r}$, Formula$f_{5r}$, and Formula$f_{6r}$, which are rotated versions of Formula$f_{3} f_{4}$, Formula$f_{5}$, and Formula$f_{6}$, respectively. The effect of rotation introduces further variable interactions, making them nonseparable.3 Rotations are performed in the decision space, on each plane using a random uniform rotation matrix [30], [31]. A new random uniform rotation matrix is generated for each individual run for the purpose of an unbiased assessment. By using these benchmark test functions, and the proposed set of evaluation criteria, we were able to compare CCPSO2 with other existing EAs.

Experiments were conducted on the above 11 test functions of 100, 500, and 1000 dimensions. The same performance measurements used for CEC'08 special session on LSGO were adopted [5]. For each test function, the averaged results of 25 independent runs were recorded. For each run, Max_FES (i.e., the maximum number of fitness evaluations) was set to Formula$5000^{\ast}n$ (where Formula$n$ is the number of dimensions). A two-tailed Formula$t$-test was conducted with a null hypothesis stating that there is no difference between two algorithms in comparison. The null hypothesis was rejected if the Formula$p$-value was smaller than the significance level Formula$\alpha=0.05$.

The population size for each swarm that participates in coevolution was set to 30. For random grouping of the variables, we used Formula${\bf S}=\{2, 5, 10, 50, 100\}$ for 100-D functions, and Formula${\bf S}=\{2,5, 10, 50, 100, 250\}$ for 500-D and 1000-D functions, where Formula${\bf S}$ includes a range of possible group size values that can be dynamically chosen. A few further experiments on Formula$f_{1}$, Formula$f_{3}$, and Formula$f_{7}$ of 2000 dimensions were also carried out, using the same setup as mentioned above.

The CGPSO described in Section IV-B was used in all experiments of CCPSO2, with Formula$p$ in (4) simply set to 0.5.

B. sep-CMA-ES

The CMA-ES algorithm, which makes use of adaptive mutation parameters through computing a covariance matrix and hence correlated step sizes in all dimensions, has been shown to be an efficient optimization method [32], [33]. However, most of the published results of CMA-ES were on functions of up to 100 dimensions [33], [34]. One major drawback of CMA-ES is its cost in calculating the covariance matrix, which has a complexity of Formula$O(n^{2})$. As dimensionality increases, this cost rapidly rises. Furthermore, sampling using a multivariate normal distribution and factorization of the covariance matrix also become increasingly expensive. In a recent effort to alleviate this problem [35], a simple modification to CMA-ES was introduced where only the calculation of the diagonal elements of the covariance matrix Formula${\bf C}$ is required. Thus, the complexity of updating Formula${\bf C}$ becomes linear with Formula$n$. Since only Formula${\bf C}$ diagonal is utilized, the sampling becomes independent in each dimension, making this CMA-ES variant (so called sep-CMA-ES) no longer rotationally invariant. The tradeoff here is the much reduced cost as opposed to the deficiency in handling nonseparable problems. It was shown in [35] that sep-CMA-ES performs not only faster, but also surprisingly well on several separable and nonseparable functions of up to 1000 dimensions, outperforming the original CMA-ES.

In this paper, sep-CMA-ES [35], [36] was chosen to compare against the proposed CCPSO2. In our experiments, for sep-CMA-ES, each individual in the initial population is generated uniformly within the variable bounds Formula$[A,B]^{n}$. The initial step-size Formula$\sigma$ is set to Formula$(B-A)/2$. The population size was determined by Formula$n$ such that Formula$\lambda=4+\lfloor 3 {\rm ln}(n)\rfloor$, Formula$\mu=\lfloor{{\lambda}/{2}}\rfloor$, where Formula$n$ is the dimensions of the problem, Formula$\lambda$ is the number of offspring, and Formula$\mu$ is the number of parents, as suggested in [35]. A number of stopping criteria were used. For example, sep-CMA-ES stops if the range of the best objective function values of the last Formula$10+\lceil 30n/\lambda\rceil$ iterations is zero or below a small value specified, or if the standard deviation of the normal distribution is smaller than a specified threshold value in all coordinates, or if the maximal allowed number of evaluations is reached.

SECTION VI

RESULTS AND ANALYSIS

This section consists of five parts. In the first part, we compare several PSO variants using different update rules, and demonstrate why CGPSO is the best subcomponent optimizer for CCPSO2. In the second part, we examine the effects of CCPSO2 using random grouping with changing group sizes. The results of CCPSO2 on 500-D functions are compared with those using a prespecified fixed group size. In the third part, we compare the results of both CCPSO2 and sep-CMA-ES on functions of 100, 500, and 1000 dimensions. We then compare CCPSO2 with two existing PSO algorithms and a cooperative coevolving DE on functions of 1000 dimensions, for which the results were obtained from the CEC 2008 competition on LSGO [37]. In the final part we present results on functions of 2000 dimensions in order to further challenge CCPSO2's ability to scale to even higher dimensions.

A. Why Cauchy and Gaussian PSO?

In Section IV-B, we provided the rationale of choosing the CGPSO as the subcomponent optimizer for optimizing each swarm under the framework of CC. In this section, we present the empirical evidence to support this choice. For a function of 1000 dimensions, Formula$s$ may range from several dimensions up to several hundreds. In this paper, we used four PSO variants using the ring topology, including constricted PSO (CPSO) [i.e., using (1) and (2)], Bare-bones PSO (3), GPSO [25], and CGPSO (4). We chose Formula$f_{1}$ to Formula$f_{7}$ of 2, 5, 10, 20, 50, and 100 dimensions. Using a small population of 30, each algorithm was run 50 times (with each run allowing 300000 FES), and the mean and standard error of the best fitness were recorded. As shown in Figs. 2 and 3, for Formula$f_{1}$ to Formula$f_{7}$ up to 50 dimensions, CGPSO is the overall best performer. However, when the dimension was increased to 100, CGPSO's lead is no longer obvious. CGPSO outperformed CPSO on Formula$f_{1}$, Formula$f_{4}$, and Formula$f_{5}$, but lost to CPSO on Formula$f_{2}$, Formula$f_{3}$, and Formula$f_{6}$. Both CGPSO and CPSO performed similarly on Formula$f_{7}$. Note that the global optimum of Formula$f_{7}$ is usually unknown (unlike other functions) and may change depending on the dimensionality. Results of the other two variants, Bare-bones PSO and GPSO were not competitive. In particular, GPSO tended to prematurely converge very quickly on several functions. Comparing GPSO and CGPSO, it is noticeable that using a combination of Cauchy and Gaussian distributions is far more effective than using only a Gaussian distribution for sampling. Overall, CGPSO appears to be the most consistent and better performer over other variants. Typically for a function of 500 dimensions, a subcomponent size (i.e., group size Formula$s$) ranging from 2 to 100 dimensions is most likely to be adaptively chosen (see Section VII) during the CC process. Hence, we selected CGPSO as the subcomponent optimizer for each swarm of CCPSO2.

Figure 2
Fig. 2. Averaged best fitness values of four PSO variants (using a ring topology) on Formula$f_{1}$ of 2, 5, 10, 20, 50, and 100 dimensions.
Figure 3
Fig. 3. Averaged best fitness values (with one standard error) of four PSO variants using a ring topology on Formula$f_{2}$ to Formula$f_{7}$ of 2, 5, 10, 20, 50, and 100 dimensions. (a) Formula$f_{2}$ SchwefelProblem. (b) Formula$f_{3}$ ShiftedRosenbrock. (c) Formula$f_{4}$ ShiftedRastrigin. (d) Formula$f_{5}$ ShiftedGriewank. (e) Formula$f_{6}$ ShiftedAckley. (f) Formula$f_{7}$ FastFractal.

B. Effects of Dynamically Changing Group Size

Table I compares the results of CCPSO2 using dynamically changing group size Formula$s$ and those using a prespecified fixed group size. When different fixed group sizes are used, it is noticeable that performance fluctuates a lot, depending heavily on the given Formula$s$ value. On the contrary, CCPSO2, using a dynamically changing Formula$s$ value, consistently gave a better performance than the other variants. The closest rival to CCPSO2 is the Formula$s=50$ variant, which only outperformed CCPSO2 on Formula$f_{2}$. However, in most real-world problems, we do not have any prior knowledge about the optimal Formula$s$ value.

Table 1
TABLE I RESULTS ON 500-D FUNCTIONS OF CCPSO2 USING RANDOM GROUPING WITH A CHANGING GROUP SIZE AND THOSE CCPSO2 VARIANTS USING A PRESPECIFIED FIXED GROUP SIZE

Among the CCPSO2 variants using a fixed Formula$s$ value, it is interesting to note that it is not necessarily always better to use a small fixed Formula$s$ even for separable functions such as Formula$f_{1}$. The Formula$s=100$ variant in fact is much better than the Formula$s=5$ variant. The better performance of CCPSO2 using a dynamically changing Formula$s$ may be attributed to the diverse range of Formula$s$ values that CCPSO2 can utilize during a run. CCPSO2 can use smaller Formula$s$ values to evolve small subcomponent-based intermediate solutions first and then uses larger Formula$s$ values to evolve the combined intermediate solutions to achieve even fitter overall solutions.

Fig. 4 shows typical CCPSO2 runs with a dynamically changing group size on the CEC'08 test functions. It is noticeable that for separable functions Formula$f_{1}$, Formula$f_{4}$, Formula$f_{6}$, and Formula$f_{5}$ (which is considered as a more separable function for 500-D), CCPSO2 tended to choose a mixture of small and large group sizes at different stages of a run. For nonseparable functions such as Formula$f_{2}$, Formula$f_{3}$, and Formula$f_{7}$, CCPSO2 tended to favor a small group size throughout the run. Since the group size remains unchanged when there is further performance improvement, this suggests CCPSO2 was still able to improve the performance though very slowly.

Figure 4
Fig. 4. Changing group size during a CCPSO2 run on Formula$f_{1}$ to Formula$f_{7}$ of 500 dimensions. (The result of Formula$f_{3}$ ShiftedRosenbrock is not shown as it has a unchanged group size of 5 over the run.) (a) Formula$f_{1}$ ShiftedSphere. (b) Formula$f_{2}$ SchwefelProblem. (c) Formula$f_{4}$ ShiftedRastrigin. (d) Formula$f_{5}$ ShiftedGriewank. (e) Formula$f_{6}$ ShiftedAckley. (f) Formula$f_{7}$ FastFractal.

C. Comparing CCPSO2 With sep-CMA-ES

Table II shows the results of the 11 test functions of 100-D, 500-D, and 1000-D. The result of sep-CMA-ES scaled very well from 100-D to 1000-D on Formula$f_{1}$, Formula$f_{3}$, and Formula$f_{4r}$, outperforming CCPSO2 on all these cases. On Formula$f_{5}$ and Formula$f_{5r}$, the performance of sep-CMA-ES is better on 100-D, but not statistically different on 500-D and 1000-D. Figs. 58 show the convergence plots where it can be noted the sep-CMA-ES achieved very fast convergence on Formula$f_{1}$, Formula$f_{3}$, Formula$f_{5}$, and Formula$f_{5r}$.

Table 2
TABLE II RESULTS OF CCPSO2 AND SEP-CMA-ES ON TEST FUNCTIONS OF 100-D, 500-D, AND 1000-D

In comparison, CCPSO2 clearly outperformed sep-CMA-ES on Formula$f_{2}$, Formula$f_{4}$, Formula$f_{6}$, and Formula$f_{7}$. On Formula$f_{6r}$, CCPSO2 performed better on 100-D and 500-D, but its performance is not statistically different from sep-CMA-ES on 1000-D. There is also no statistical difference on Formula$f_{3r}$. It can be also noted that rotation of Formula$f_{4}$ into Formula$f_{4r}$ degraded the performance of CCPSO2, but had little effect on sep-CMA-ES.

On Formula$f_{2}$, Formula$f_{4}$, Formula$f_{6}$, and Formula$f_{6r}$, sep-CMA-ES prematurely converged as soon as a run started.4 sep-CMA-ES also had a poor performance compared with CCPSO2 on Formula$f_{7}$FastFractal function, which has a more complex and irregular fitness landscape, and highly multimodal. The standard CMA-ES (which uses the full covariance matrix) was also tested on Formula$f_{2}$, Formula$f_{4}$, and Formula$f_{6}$, but CMA-ES suffered from the same problem of premature convergence.

The offspring population size Formula$\lambda$ of a sep-CMA-ES run was determined by Formula$\lambda=4+\lfloor 3 {\rm ln}(n)\rfloor$. For Formula$n$ equals to 100, 500, and 1000, Formula$\lambda$ is 17, 22, and 24, respectively. Further tests revealed that these population sizes were too small for sep-CMA-ES (or CMA-ES) to perform well on Formula$f_{2}$, Formula$f_{4}$, Formula$f_{6}$, and Formula$f_{7}$, though smaller population sizes did not pose any trouble on Formula$f_{1}$, Formula$f_{3}$, and Formula$f_{5}$. For sep-CMA-ES to perform well, it requires a much larger population size in order to sample effectively around the mean of the selected fit individuals. This suggests that the formula proposed for determining the population size Formula$\lambda=4+\lfloor 3 {\rm ln}(n)\rfloor$ is not suitable for multimodal functions of higher dimensions Formula$(n>100)$. This observation is also supported in [38], where a study was carried out specifically on CMA-ES over multimodal functions. It shows that on the Rastrigin function of only 80 dimensions and Schwefel of 20 dimensions, CMA-ES requires a Formula$\lambda$ greater than 1000 in order to maintain a success rate of 95% (Figs. 2 and 3 of [38]). In [35], a different formula for determining population size was also studied: Formula$\lambda=2n$. Nevertheless this means that the population will be unrealistically large for functions of 1000 dimensions for instance. Fig. 9(a)shows the results of sep-CMA-ES on Formula$f_{4}$(ShiftedRastrigin) of 500 dimensions, over population sizes ranging from 100 to 1500. It can be seen that as the population size increases, the performance also improves, however, even with a population size of 1500, the best fitness achieved was 14.725, which is still worse than CCPSO2. Fig. 9(b) shows that on the more complex Formula$f_{7}$ (FastFractal), the best result achieved was Formula$-{\rm 7057.21}$ (when a population size of 400 was used), which is still worse than the best result obtained by CCPSO2 (i.e., Formula${-}{7.23}{\rm E}+03$ as shown in Table II). Clearly, sep-CMA-ES's performance is sensitive to the population size parameter. And even if the optimal population size is chosen, CCPSO2 still outperforms sep-CMA-ES on these multimodal functions. More importantly, the performance of CCPSO2 does not depend on a large population size. For all experiments, a small swarm of 30 particles for a coevolving subcomponent was shown to be sufficient in producing reasonable performances.

Figure 5
Fig. 5. Averaged best fitness values for sep-CMA-ES and CCPSO2 on functions of 500 dimensions. (a) Formula$f_{1}$ ShiftedSphere. (b) Formula$f_{2}$ SchwefelProblem. (c) Formula$f_{3}$ ShiftedRosenbrock. (d) Formula$f_{4}$ ShiftedRastrigin. (e) Formula$f_{5}$ ShiftedGriewank. (f) Formula$f_{6}$ ShiftedAckley. (g) Formula$f_{7}$ ShiftedRastrigin. (h) Formula$f_{3r}$ ShiftedRotatedRosenbrock. (i) Formula$f_{4r}$ ShiftedRotatedRastrigin.
Figure 6
Fig. 6. Averaged best fitness values for sep-CMA-ES and CCPSO2 on functions of 1000 dimensions. (a) Formula$f_{1}$ ShiftedSphere. (b)Formula$f_{2}$ SchwefelProblem. (c) Formula$f_{3}$ ShiftedRosenbrock. (d) Formula$f_{4}$ ShiftedRastrigin. (e) Formula$f_{5}$ ShiftedGriewank. (f) Formula$f_{4}$ ShiftedRastrigin. (g) Formula$f_{7}$ ShiftedRastrigin. (h) Formula$f_{3r}$ ShiftedRotatedRosenbrock. (i) Formula$f_{4r}$ ShiftedRotatedRastrigin.
Figure 7
Fig. 7. Averaged best fitness values for sep-CMA-ES and CCPSO2 on Formula$f_{5r}$ and Formula$f_{6r}$ of 500 dimensions. (a) Formula$f_{5r}$. (b) Formula$f_{6r}$.
Figure 8
Fig. 8. Averaged best fitness values for sep-CMA-ES and CCPSO2 on Formula$f_{5r}$ and Formula$f_{6r}$ of 1000 dimensions. (a) Formula$f_{5r}$. (b) Formula$f_{6r}$.
Figure 9
Fig. 9. Averaged best fitness values of sep-CMA-ES on Formula$f_{4}$ and Formula$f_{7}$ of 500 dimensions, over a range of population sizes. (a) Formula$f_{4}$ ShiftedRastrigin. (b) Formula$f_{7}$ FastFractal.

Table II shows that overall, for the 11 test functions, CCPSO2 performed significantly better than sep-CMA-ES on five functions, while losing to sep-CMA-ES on three functions. For the remaining three test functions, there is no statistical difference between the two. Most importantly, CCPSO2 performed better than sep-CMA-ES on high-dimensional multimodal functions with a complex fitness landscape such as Formula$f_{7}$, which arguably more closely resemble real-world problems. The CC framework allowing decomposing a high dimensional problem into small subcomponents is a key contributing factor that CCPSO2 scales well to very high-dimensional problems.

CCPSO2 tended to converge slower than sep-CMA-ES, but had more potential to further improve its performance in the later stage of a run. This may be attributed to the use of the Cauchy distribution in CCPSO2 to encourage more exploration. On the contrary, sep-CMA-ES converged extremely fast, however, it either converged very well or tended to become stagnant very quickly, especially on complex high dimensional multimodal functions.

D. Comparing CCPSO2 With Other Algorithms

Table III shows the results of CCPSO2 on the seven CEC'08 benchmark test functions of 1000 dimensions, in comparison with two recently proposed PSO variants, EPUS-PSO [39], DMS-PSO [40], and a CC DE algorithm MLCC [19]. The results of these algorithms were obtained under the same criteria, as set out by the CEC'08 special session on LSGO [5]. A two-tailed Formula$t$-test was only conducted between CCPSO2 and MLCC, since CCPSO2 clearly outperformed EPUS-PSO and DMS-PSO.

Table 3
TABLE III COMPARISON BETWEEN CCPSO2, EPUS-PSO, DMS-PSO, AND MLCC AND ON 1000-D FUNCTIONS

EPUS-PSO used an adaptive population sizing strategy to adjust the population size according to the search results [39]. DMS-PSO adopted a random regrouping strategy to introduce a dynamically changing neighborhood structure to each particle. Every now and then the population is regrouped into multiple small subpopulations according to the new neighborhood structures [40]. MLCC also adopts a CC approach (like CCPSO2) applying both random grouping and adaptive weighting [19]. Given a fixed number of evaluations, the applications of adaptive weighting is at the expense of random grouping, which proves to be less effective [14]. In addition, MLCC employs a self-adaptive mechanism to favor certain group size over others when invoking the random grouping scheme. All these three algorithms participated in the CEC 2008 competition on LSGO [37].

CCPSO2 outperformed EPUS-PSO on six out of the seven CEC'08 test functions, except Formula$f_{2}$. CCPSO2 also outperformed DMS-PSO on five functions, except Formula$f_{1}$ and Formula$f_{5}$. DMS-PSO found the global optimum for Formula$f_{1}$ and Formula$f_{5}$, but its performance is exceptionally poor in comparison with CCPSO2 on the other five functions. CCPSO2 adopts a CC framework to decompose a high-dimensional problem, whereas DMS-PSO relies on the use of a large number of small sub-swarms (each has only three particles) to maintain its population diversity. DMS-PSO does not employ any dimensionality decomposition strategy. As a result, DMS-PSO uses much larger population sizes to handle high-dimensional problems. For example, for 100, 500, and 1000-D functions, the population sizes were 90, 450, and 900, respectively. In contrast, CCPSO2 uses just a small population size of 30 for all dimensions. This suggests that DMS-PSO may only work well in some unimodal fitness landscapes, but has a very poor ability in handling complex multimodal fitness landscapes.

Comparing with MLCC, CCPSO2 is better on Formula$f_{1}$, Formula$f_{2}$, and Formula$f_{3}$, but statistically not different on Formula$f_{5}$ and Formula$f_{6}$. MLCC is better on Formula$f_{4}$ and Formula$f_{7}$. MLCC uses a self-adaptive neighborhood search DE (SaNSDE) [12] as its core algorithm to coevolve its CC subcomponents. SaNSDE is able to maintain a better diversity of search step sizes and the population, thereby contributing to its better performance on the multimodal problems.

E. Comparisons on Functions of 2000 Dimensions

From the previous Section VI-C (Table II, Figs. 5 and 6), we noticed that on Formula$f_{2}$, Formula$f_{4}$, and Formula$f_{6}$ of 100, 500, and 1000 dimensions, sep-CMA-ES always prematurely converged not long after the run started. sep-CMA-ES was actually outperformed substantially by CCPSO2 on these functions. It is reasonable to expect that sep-CMA-ES be outperformed by CCPSO2 on these functions of even higher dimensions such as 2000-D.

Table IV shows the results of CCPSO2 and sep-CMA-ES on the three other CEC'08 test functions: Formula$f_{1}$, Formula$f_{3}$, and Formula$f_{7}$ of 2000 dimensions. These results were averaged over 25 runs, with each run consuming Formula$1.0{\rm E}+07$ FES. To our knowledge, this paper is the first reporting results on these functions of 2000 dimensions. Overall, we can see that both CCPSO2 and sep-CMA-ES continued to scale well on Formula$f_{1}$ and Formula$f_{3}$ of 2000 dimensions. sep-CMA-ES performed better than CCPSO2 on Formula$f_{1}$ (which is a separable function) and Formula$f_{3}$, but was outperformed by CCPSO2 on the more complex multimodal Formula$f_{7}$. Considering the tendency of sep-CMA-ES converging prematurely on multimodal functions Formula$f_{2}$, Formula$f_{4}$, and Formula$f_{6}$, once again CCPSO2 shows its better search capability in handling complex high-dimensional multimodal functions.

Table 4
TABLE IV RESULTS OF CCPSO2 AND SEP-CMA-ES ON Formula$f_{1}$, Formula$f_{3}$, AND Formula$f_{7}$ of 2000 DIMENSIONS.
SECTION VII

CONCLUSION

In this paper, we presented a new CC PSO, CCPSO2, for tackling large scale optimization problems. We demonstrated that the CC framework adopted is a powerful approach to improving PSO's ability in scaling up to high-dimensional optimization problems (of up to 2000 real-valued variables). Several new techniques have been incorporated into CCPSO2 to enhance its ability to handle high-dimensional problems: a novel PSO model (CGPSO) was developed to sample a particle's next point following a combination of Cauchy and Gaussian distributions; this CGPSO adopts an Formula$lbest$ ring topology for defining its local neighborhood. The algorithm employed the random grouping scheme with a dynamically changing group size. CGPSO showed improved search capability compared with existing PSO models. The effects of using a dynamically changing group size has shown that on most high-dimensional functions, a combination of small and reasonably large group sizes are advantageous.

Furthermore, CCPSO2 was compared with a state-of-the-art evolutionary algorithm sep-CMA-ES. Our results showed that with only a small population size, CCPSO2 was very competitive on high-dimensional functions having a more complex multimodal fitness landscape such as Formula$f_{7}$ (FastFractal), though sep-CMA-ES performed slightly better on unimodal functions. Our findings also reveal that the performance of sep-CMA-ES could degrade rapidly on a multimodal fitness landscape, even if a very large population size was chosen. Further comparative studies on 1000-D functions suggest that CCPSO2's performance is comparable to a CC DE algorithm [19], and much better than two other PSO algorithms which were specifically designed for large scale optimization.

We also challenged CCPSO2 with functions of 2000 dimensions, and the results show that CCPSO2 continued to outperform sep-CMA-ES on the more complex multimodal function Formula$f_{7}$ (FastFractal), while performing reasonably well on the unimodal functions.

In future, we are planning to examine more “intelligent” grouping strategies to better capture the interdependency among variables [41], rather than just random grouping. We are also interested in applying CCPSO2 to real-world problems such as shape optimization [42] to ascertain its true potential as a valuable optimization technique for large-scale optimization.

ACKNOWLEDGMENT

The authors would like to thank Z. Yang for insightful discussions on the working of DECC-G and M. N. Omidvar for valuable comments.

Footnotes

This work was supported by EPSRC, under Grant EP/G002339/1, which funded the first author's two trips to Birmingham, U.K., as a Visiting Research Fellow in 2008 and 2009.

X. Li is with the School of Computer Science and Information Technology, Royal Melbourne Institute of Technology, Melbourne, VIC 3001, Australia(e-mail: xiaodong.li@rmit.edu.au).

X. Yao is with the Center of Excellence for Research in Computational Intelligence and Applications, School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K. (e-mail: x.yao@cs.bham.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

1The shape of the Lévy distribution can be controlled by a parameter Formula$\alpha$. For Formula$\alpha=2$ it is equivalent to the Gaussian distribution, whereas for Formula$\alpha=1$ it is equivalent to the Cauchy distribution [22].

2In CCPSO2 a coevolutionary cycle is one round-robin pass of coevolution of all subpopulations. Hence a cycle is equivalent to an iteration here.

3Note that Formula$f_{3}$ and Formula$f_{5}$ are already nonseparable.

4The run may be terminated if no further improvement after several iterations as described in Section V-B.

References

No Data Available

Authors

Xiaodong Li

Xiaodong Li

Xiaodong Li (M'03–SM'07) received the B.S. degree from Xidian University, Xi'an, China, in 1988, and the Dipl.Com. and Ph.D. degrees from the University of Otago, Dunedin, New Zealand, in 1992 and 1998 respectively, all in information science.

He is currently with the School of Computer Science and Information Technology, Royal Melbourne Institute of Technology, Melbourne, Australia. His current research interests include evolutionary computation (in particular, evolutionary multiobjective optimization, evolutionary optimization in dynamic environments, and multimodal optimization), neural networks, complex systems, and swarm intelligence.

Dr. Li is an Associate Editor of the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION and International Journal of Swarm Intelligence Research. He is a member of the IEEE Computational Intelligence Society (CIS) Task Force on Swarm Intelligence (since its inception), the IEEE CIS Task Force on Evolutionary Computation in Dynamic and Uncertain Environments, and the IEEE CIS Task Force on Large Scale Global Optimization. He is a member of the Technical Committee on Soft Computing, of the Systems, Man and Cybernetics Society and a member of the IASR Board of Editors for the Journal of Advanced Research in Evolutionary Algorithms.

Xin Yao

Xin Yao

Xin Yao (M'91–SM'96–F'03) received the B.S. degree from the University of Science and Technology of China (USTC), Hefei, China, in 1982, the M.S. degree from the North China Institute of Computing Technology, Beijing, China, in 1985, and the Ph.D. degree from USTC in 1990.

He was an Associate Lecturer and a Lecturer with the USTC, from 1985 to 1990, a Post-Doctoral Fellow with the Australian National University, Canberra, Australia, and CSIRO, Melbourne, Australia, from 1990 to 1992, and a Lecturer, Senior Lecturer, and Associate Professor with the University of New South Wales at the Australian Defence Force Academy, Canberra, from 1992 to 1999. Since April 1999, he has been a Professor (Chair) of computer science with the University of Birmingham, Birmingham, U.K., where he is currently the Director of the Center of Excellence for Research in Computational Intelligence and Applications. He is also a Distinguished Visiting Professor (Grand Master Professorship) of USTC. He has more than 350 refereed publications. His current research interests include evolutionary computation and neural network ensembles.

Dr. Yao was the recipient of the 2001 IEEE Donald G. Fink Prize Paper Award, IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION Outstanding 2008 Paper Award, and several other best paper awards. He is a Distinguished Lecturer of the IEEE Computational Intelligence Society, an invited Keynote/Plenary Speaker of 60 international conferences, and a former, from 2003 to 2008, Editor-in-Chief of the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION.

Cited By

No Data Available

Keywords

Corrections

None

Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size