By Topic

• Abstract

SECTION I

## INTRODUCTION

IN the past decades, numerous population-based algorithms, such as evolutionary algorithms (EAs) [1], particle swarm optimizers [2], and differential evolution [3], have been used for numerical optimization. However, their performance may vary greatly from one problem to another and there is no best algorithm for all problems. In practice, there is an inherent risk associated with choosing a single algorithm for a given problem, as we do not know in advance which algorithm is optimal for the problem. For many real-world applications, finding the global optimum may not be practical or even possible. The best solution found by a given deadline is often considered to be very important. As opposed to an algorithm that performs extremely well on some problems but very poorly on others, an algorithm that works well on a large variety of problems would be more desirable for many real-world applications (of course, we do not want to depress the performance of the algorithm). To obtain such an algorithm, two issues need to be addressed. First, given a set of problems from different classes, the question of how to evaluate the risk of applying an available algorithm to these problems needs to be answered. Second, we need to find a way of minimizing this risk, i.e., to develop a low-risk algorithm.

Unfortunately, no formal measurable definition of an algorithm's risk on a set of different problems is available so far, not to mention any effort on reducing the risk. This paper presents a metric for comparing the risks associated with two algorithms. It also proposes a population-based algorithm portfolio (PAP), which not only reduces the risk of failing on problems from different classes, but also makes finding high-quality solutions more likely.

The basic idea of PAP is simple: instead of betting the entire time budget on a single algorithm, we “invest” our time in multiple algorithms. This idea has been explored for more than ten years, based on the theory of investment portfolios that was developed in the field of economics to answer the question: “How should one allocate his/her financial assets (stocks, bonds, etc.) in order to maximize the expected returns while minimizing risks” [4]. In analogy with investment portfolios, PAP focuses on the problem of how to allocate computation time among algorithms and fully utilize the advantages of these algorithms in order to maximize the expected utility of a problem solving episode. Based on the portfolio idea, various approaches have been developed, either to minimize the expected time needed to solve problem instances or to maximize the expected quality of the solution while the available time is kept constant. However, all the existing approaches were proposed for combinatorial problems. For example, Huberman et al. [5] proposed an economic approach for constructing “portfolios” of multiple algorithms to solve combinatorial problems. They showed how executing two or more Las Vegas algorithms could significantly shorten the expected time for finding a feasible solution to the graph coloring problem. Gomes and Selman [6] applied a Las Vegas algorithm portfolio to the quasi-group completion problem. The purpose was also to reduce the expected time for finding a solution. In their empirical study, the portfolio approach led to a significant performance improvement. Fukunaga [4] extended the algorithm portfolio approach to resource-bounded combinatorial optimization by combining instances of EAs with multiple sets of control parameter value settings into algorithm portfolios.

Compared to the existing work on algorithm portfolios, this paper has two major contributions. First, the proposed PAP is developed for numerical optimization instead of combinatorial optimization. Second, we investigate the term “risk” in a different context. Previous work mainly aimed to reduce the risk of an algorithm on a specific optimization problem, which can be measured by applying the algorithm to the problem for multiple times. In this paper, we are more interested in reducing the risk over a set of problems, i.e., the risk is measured by applying the algorithm to multiple problems.

Our PAP is a general framework for combining different population-based search algorithms. It allocates computation time among more than one constituent algorithm and activates interaction among them so that they can benefit from one another. To be more specific, the computation time is measured by the number of fitness evaluations (FEs), and the allocation of FEs is implemented by dividing the whole population into several subpopulations and evolving them in a parallel manner (using different constituent algorithms). To demonstrate the efficacy of the proposed PAP, we chose four existing population-based algorithms, including self-adaptive differential evolution with neighborhood search (SaNSDE) [7], particle swarm optimizer with inertia weight (wPSO) [8], generalized generation gap (G3) model with generic parent-centric recombination (PCX) operator (G3PCX) [9], and covariance matrix adaptation evolution strategy (CMA-ES) [10], as the constituent algorithms. These algorithms belong to four well-known families of EAs and thus enable us to investigate PAP in the general context of population-based algorithms. Eleven instantiations of PAP were implemented using these constituent algorithms. Six of them consist of two constituent algorithms, another four are composed of three constituent algorithms, and the remaining one contains four constituent algorithms. The PAP instantiations were then evaluated on both the classical benchmark functions used by Yao et al. [11] and CEC2005 benchmark functions [12].

The rest of this paper is organized as follows. Section II describes the proposed PAP in details. Section III presents a new metric for comparing the risks associated with two algorithms. Experimental studies are presented in Section IV. Section V presents some further analyses to investigate the characteristics of PAP more comprehensively. Finally, we draw the conclusions and discuss future work in Section VI.

SECTION II

## POPULATION-BASED ALGORITHM PORTFOLIOS FOR REDUCING RISK

### A. General Framework

Assume that an optimization problem $f$, some computation time (i.e., FEs) and $m$ population-based algorithms $A=\{A_{i}\vert i=1,2,\ldots,m\}$ are given. We aim to find the optimal solution to the problem. In PAP, computation time is allocated to the $m$ algorithms before the problem-solving process, and then the $m$ algorithms search for the optimal solution within their own time budgets. Specifically, PAP holds $m$ separate subpopulations, each constituent algorithm works on one of them. Migration of individuals among subpopulations is activated regularly to encourage information sharing. PAP adopts a very simple migration scheme. Two parameters $migration\_interval$ and $migration\_size$ are defined in advance, which determine the number of generations between two subsequent migrations and the number of emigrants in each migration, respectively. Every time a migration is activated, it involves all constituent algorithms. For each constituent algorithm (say, $A_{i}$), we combine the subpopulations of the remaining $m-1$ algorithms, identify and duplicate the best $migration\_size$ individuals. Then, these copies are migrated to the subpopulation of $A_{i}$. After that, the worst $migration\_{}size$ individuals of the new subpopulation of $A_{i}$ (i.e., with the emigrants) will be discarded. PAP terminates when the given FEs have been used up. The pseudo-code of PAP is shown in Fig. 1.

Fig. 1. Pseudo-code of PAP.

In practice, it is often the case that an algorithm continuously finds better solutions at the early stage of the search. As the search proceeds, it becomes more and more difficult to find a solution that is better than the best one found so far. In the worst case, the search might be trapped in a local optimum after a certain number of FEs and no further improvements can be made after that. If such a scenario occurs before the time budget is used up, keeping on running the same algorithm using the remaining FEs is a waste of time. By investing computation time in multiple algorithms, it is more likely that the search will keep progressing along the whole search episode, since the chance of all the constituent algorithms being stuck at the same point in the solution space is quite small. From the risk reduction perspective, the computation time allocation strategy alone reduces the risk of suffering from premature convergence. One might argue that we can switch to another algorithm once an algorithm is trapped in a local optimum for a long time. However, it is never easy to decide when to switch, and to which algorithm we should switch. Instead, allocating computational resources in advance might be a better choice due to its ease of implementation. The PAP presented in Fig. 1 can be seen as a first attempt to implement the resource allocation idea for numerical optimization.

In addition to the allocation of computation time, migration of individuals among subpopulations is also indispensable to PAP. Let us view the quality of the solution obtained by an algorithm as a function of the computation time $t$ it consumes. As long as the best solution found so far is always retained during the search process (either in the population or at some other place), running an algorithm with additional time will never deteriorate the quality of the final solution. This fact can be formally presented as the following inequality: TeX Source $$E(q(A_{i},f,t_{i}))\leq E(q(A_{i},f,T))),\quad {\rm if }t_{i}\leq T\eqno{\hbox{(1)}}$$ where $q(A_{i},f,t_{i})$ denotes the quality of the best solution obtained by $A_{i}$ with time $t_{i}$. We use the expectation form because $q(A_{i},f,t_{i})$ is inherently a random variable that depends on random initializations. Since PAP allocates the computation time before searching for solutions, the time $t_{i}$ assigned to $A_{i}$ can be viewed as a constant. Hence, if the constituent algorithms of PAP conduct search independently (i.e., no migration), the following inequality holds for any problem $f$: TeX Source \eqalignno{E[q(PAP_{ind},f,T)]&= {\max}\{E[q(A_{i},f,t_{i})]\} \cr &\leq {\max}\{E[q(A_{i},f,T)]\},\quad i=1,\ldots,m, \cr& \,{\rm Subject to}\,\sum _{i=1}^{m} t_{i}=T&{\hbox{(2)}}} where $PAP_{ind}$ denotes PAP whose constituent algorithms work independently. Equation (2) shows that allocating computation time to multiple algorithms and running them independently is really a bad idea. Hence, migration among subpopulations is necessary so as to break the law demonstrated by (2).

### B. Related Work

The PAP can be viewed as a framework that utilizes different search biases for problem solving. In this section, we review some previous relevant work and discuss the differences between them and the PAP. In general, previous work related to combining different search biases can be summarized into two categories.

The first category attempts to integrate multiple search biases from operators or algorithms. All the constituent operators or algorithms work on a shared population of solutions. Every constituent operator/algorithm has access to any solution in the population. The major effort for designing such a method is to decide for each constituent operator/algorithm when to work, what to work on, and how often to work. In the improved fast evolutionary programming [11], two offspring are generated for each parent, one by Cauchy mutation and the other by Gaussian mutation. The two offspring are then compared and the better one survives to the next generation. The mixed strategy evolutionary programming proposed by Dong et al. [15] utilizes four different mutation strategies: Gaussion, Cauchy, Lévy, and single-point mutations. Each individual chooses one of the four mutation strategies to generate its offspring based on a mixed strategy distribution, which is dynamically adjusted according to the performance of mutation strategies. In [16], a self-adaptive DE algorithm was proposed. It mixes four different DE mutation variants using a probabilistic model and adapts the probability of the four variants during the search based on their previous success rates. Whitacre et al. [17] and Thierens [18] independently investigated approaches for adapting the probabilities of applying different operators. DaCosta et al. [19] focused more on adapting the operator selection scheme with the Multiarmed Bandit paradigm.

More recently, Vrugt and Robinson [20] proposed a multialgorithm genetically adaptive method (AMALGAM), which incorporates various algorithms in one framework. At each generation, offspring are generated using different algorithms [20], and the number of offspring created by each algorithm is adapted based on their performance in the previous generations. Another well-studied approach, namely the asynchronous teams (A-Teams), works in a similar manner [21], [22]. A lot of emphasis has been put into developing universal structures or even a toolbox for generalizing A-Teams to various real-world problems [23]. At the same time as this paper was under review, Mallipeddi and Suganthan [24] proposed an ensemble of constraint handling techniques (ECHT) for constrained optimization problems. ECHT also utilizes multiple subpopulations, each of which is assigned with a constraint handling technique. At each generation, each subpopulation generates new individuals based on its own. Then, each subpopulation is combined with new individuals generated by all subpopulations, and selection is conducted with the corresponding constraint handling technique. Compared to all the above-mentioned work, in which all search biases have access to the whole population, PAP allows only its constituent algorithms to work on a subpopulation, i.e., full access to the whole population is prohibited. Migration is the only route that different subpopulations communicate with each other. This strategy, used by PAP, reduces the likelihood of different constituent algorithms repeating similar search behaviors or sharing similar search biases.

From the perspective of employing a multipopulation model and migration operators, PAP is related to the distributed EAs (dEAs), which have several subpopulations (islands) and perform sparse exchange of individuals among them [25], [26]. However, PAP is quite different from previous dEAs in three major ways. First, most dEAs run the same EA on all subpopulations [26], [27], [28], [29], [30], [31] (either with the same or different control parameters), while PAP employs different EAs. Second, the migration scheme of PAP does not assume any topology of subpopulations, while many dEAs do. By this means, we solely focus on investigating whether it is worthy of allocating the limited computation time to different algorithms so that the overall performance can be enhanced. In contrast, a specific topology/structure for the relationship between subpopulations is usually explicitly defined in existing dEAs [26], [27], [28], [29], [32], [33]. Third, we are more interested in reducing the overall risk of an algorithm on a spectrum of problems. This issue has never been investigated in the literature of dEAs.

In addition to dEAs, another notable work that is related to PAP is the isolation strategy employed by hierarchically organized evolution strategies (HOES) [34], [35]. In HOES, a population is partitioned into a number of subpopulations, and ES is run on them with different control parameters (e.g., search step sizes). The isolation strategy works in such a way that the subpopulations evolve separately for a predefined isolation period (e.g., a few generations), and then the control parameters are updated based on the individuals generated in the subpopulations during the isolation period. Unlike PAP, the motivation of the isolation strategy is not to make use of different search biases, but to optimize the control parameters of a search algorithm, i.e., the adaption of control parameters was formulated as an optimization problem. Such a difference in underlying motivation has led to different research foci between PAP and existing work on isolation strategy.

### C. PAP Implementation

In practice, one may implement PAP following a few steps. First, a set of constituent algorithms should be identified. Then, these constituent algorithms need to be coded (if no implementation is readily available), and the migration scheme can be easily implemented with a few lines of codes. Finally, one needs to decide how much computation time to allocate to each constituent algorithm, and set the two migration parameters. When choosing constituent algorithms, an intuition is that they should be more or less complementary. As will be shown by our empirical studies, choosing complementary constituent algorithms leads to better performance than applying the same algorithm to the problems in parallel. More specifically, the constituent algorithms should not only employ different operators, but also exhibit different behaviors on the problem set. A detailed analysis of this issue is given in Section V. For the second step, PAP should be capable of accommodating any existing population-based search algorithms since we expect it to be a general framework for combining different algorithms. However, due to the fact that some existing algorithms might have their own specific configurations, they merit a bit more attention when being incorporated into the PAP framework. Below, we elaborate on a few such cases.

Some algorithms, such as PSO and SaNSDE, store and update the best solution found during the whole course of the optimization in a special variable not belonging to the population (e.g., the $gbest$ of PSO). This solution is used to guide the moves of the individuals in the population. When including this type of algorithms into PAP, we suggest updating the $gbest$ once a better solution emerges in the whole population, no matter whether it is reached by the constituent algorithms themselves or not. Some other algorithm, like G3PCX, will terminate automatically if the status of the population has reached some predefined criterion. If a constituent algorithm of this type terminates before using up the FEs assigned to it, the remaining FEs will be allocated to the other constituent algorithms in proportion to the sizes of their subpopulations.

Another type of algorithms, including CMA-ES and estimation of distribution algorithms (EDA) [36], depends on a model to generate new solutions. For example, CMA-ES updates its covariance matrix based on the current population, and the next generation of solutions is obtained by sampling according to the covariance matrix. In this case, when a migration occurs and some emigrants survive in the subpopulation of CMA-ES, the covariance matrix should be updated after the migration (i.e., based on the new subpopulation). All the above modifications aim to make existing algorithms compatible with PAP. In practice, they merely involve negligible programming work. Therefore, it is generally straightforward to implement PAP.

The final step before running PAP is to allocate FEs and set the parameters $migration\_{}interval$ and $migration\_{}size$. In general, one may allocate FEs according to some prior knowledge about the constituent algorithms. Fortunately, such kind of information is available for most well-investigated algorithms. For example, the population size required by CMA-ES can be calculated by the equation provided in [37], PSO has been shown to work well with a population size of 20–40 [38], [39], while DE usually needs a larger population size (typically 3.33 times of the dimensionality of the problem) than PSO [3], [7], [40]. Alternatively, one may also develop a strategy to adapt the FE allocation scheme during the search procedure. Though the latter approach might deserve further investigation, our experimental study has showed that the former approach works quite well already. With regard to the migration parameters, our experimental study has showed that $migration\_{}interval =MAX\_{}GEN/20$ ($MAX\_{}GEN$ denotes the maximum number of generations) and $migration\_{}size =1$ work well for all the 11 instantiations of PAP. To understand the impact of PAP parameter setting on its performance, a sensitivity analysis over 16 different pairs of $migration\_{}interval$ and $migration\_{}size$ has been carried out. The results show that PAP is not so sensitive to the values of these parameters. Hence, the above two values are recommended as the default setting.

SECTION III

## METRIC FOR COMPARING RISKS ASSOCIATEDWITH TWO ALGORITHMS

Consider a set of problems $F=\{f_{k}\vert k=1,2,\ldots,n\}$ and a set of candidate algorithms $A=\{A_{i}\vert i=1,2,\ldots,m\}$. We are interested in the risk of an algorithm $A_{i}$ on $F$. Intuitively, the definition of risk should reflect two aspects. First, it should indicate how likely an algorithm would fail to solve a problem in $F$. Second, if an algorithm performs better on a set of problems, it is associated with smaller risk. Following these intuitions, let us start from an attempt of quantifying the risk of an algorithm. We define the probability of $A_{i}$ failing on a problem belonging to $F$ as the risk of an algorithm $A_{i}$, which can be written as TeX Source $$\displaylines{P(A_{i} {\rm fails to solve a problem belonging to }F) \hfill\cr\hfill =\sum _{k=1}^{n}P(A_{i} {\rm fails to solve }f_{k}\vert f_{k})P(f_{k});f_{k}\in F.\quad{\hbox{(3)}}}$$

In case no prior knowledge suggests an alternative distribution, we may further assume that all problems in $F$ have the same prior probability $P(f_{k})$. Then, (3) becomes TeX Source $$\displaylines{P(A_{i} {\rm fails to solve a problem belonging to }F) \hfill\cr\hfill = {{1}\over{n}}\sum _{k=1}^{n} P(A_{i} {\rm fails to solve }f_{k}\vert f_{k});f_{k} \in F.\quad{\hbox{(4)}}}$$

To use (4) to calculate the risk, the term “failure” must be defined first. Obviously, an off-the-shelf choice is to regard an algorithm as failing on a problem if it did not find the global optimal solution. However, we do not know the real global optimal solution to a real-world problem. Even if we consider the case of benchmark functions, whose global optima are known in advance, it is possible that no algorithm can find the global optimum of a problem. Consequently, if $F$ consists of a lot of such functions, (4) can hardly differentiate between such algorithms, and thus is useless in practice. A possible compromise is to say that $A_{i}$ fails on $f_{k}$ if the quality of the obtained solution does not reach a human-defined threshold. However, since the value of the threshold can be very subjective, (4) will also become subjective, and will not be a good metric for evaluating algorithms. In one word, quantifying the risk associated with an algorithm is a nontrivial task due to the difficulty of defining the “failure” of an algorithm on a problem. A more realistic alternative is to compare the risks associated with different algorithms based on the solution quality they achieved. Recall that our ultimate goal of studying risk is to choose an appropriate algorithm. This alternative, although not allowing for an explicit calculation of the risk, is sufficient for our purpose. Therefore, we propose the following method to compare the risks of two algorithms on a set of problems.

Denote $q_{i,k}$ as the quality of the best solution obtained by $A_{i}$ on $f_{k}$ in a single run. We consider the following question: given a problem set and two algorithms $A_{i}$ and $A_{j}$, which one is associated with a higher risk on $F$? Similar to (3) and (4), our answer is: $A_{i}$ is less risky than $A_{j}$iff the conditional probability of $A_{i}$ outperforming $A_{j}$ with respect to $F$ is larger than the conditional probability of $A_{j}$ outperforming $A_{i}$ with respect to $F$. Here, we say $A_{i}$ outperforms $A_{j}$ if $q_{i,k}>q_{j,k}$ (i.e., $A_{i}$ obtained a solution of higher quality). Under the assumption that every $f_{k}$ has the same prior probability, the probability of $A_{i}$ outperforming $A_{j}$ can be calculated by the following equation: TeX Source $$P(A_{i}{\rm outperforming} A_{j}\vert F) ={{1}\over {n}}\sum _{k=1}^{n} P(q_{i,k}>q_{j,k}\vert f_{k}); \forall f_{k}\in F.\eqno{\hbox{(5)}}$$

Similarly, we have TeX Source $$P(A_{j}{\rm outperforming} A_{i}\vert F)= {{1}\over {n}}\sum _{k=1}^{n}P(q_{i,k}< q_{j,k}\vert f_{k});\quad f_{k} \in F\eqno{\hbox{(6)}}$$ and TeX Source \eqalignno{& P(A_{i} {\rm outperforming} A_{j}\vert F) +P(A_{j}{\rm outperforming } A_{i}\vert F)\cr& + P(A_{i}{\rm performs the same as} A_{j}\vert F) = 1.&{\hbox{(7)}}}

Equations (5) and (6) require estimating $P(q_{i,k}>q_{j,k})$ and $P(q_{i,k}< q_{j,k})$. This can be done by applying the algorithms on $f_{k}$ for multiple times. Suppose that $A_{i}$ and $A_{j}$ are run on $f_{k}$ for $s_{i}$ and $s_{j}$ times, respectively. We randomly pick a solution obtained by $A_{i}$ and compare it to a randomly chosen solution obtained by $A_{j}$. In total, we can get $s_{i} \times s_{j}$ distinct pairs of solutions. By counting the times that a solution of $A_{i}$ beats a solution of $A_{j}$ and dividing it by $s_{i} \times s_{j}$, we get the estimate of the probability that $A_{i}$ outperforms $A_{j}$ on $f_{k}$. Equation (5) can then be estimated by repeating the above procedure for each $f_{k}$ in $F$. The conditional probability of $A_{j}$ outperforming $A_{i}$ with respect to $F$ can be estimated in the same way. Since the performance of $A_{i}$ and $A_{j}$ depends on the available budget of computation time, so does (5). As a result, the value of (5) may vary significantly with different time budgets.

Equation (5) indicates that the larger the term $P(A_{i} {\rm \rm outperforming} A_{j}\vert F)$, the less risky $A_{i}$ is in comparison to $A_{j}$. Taking a closer look at the estimation of $P(q_{i,k}>q_{j,k})$, we can find that it is closely linked to the test statistic $U$ used in the Wilcoxon rank-sum test [41]. The only difference is that we do not count the case that $A_{i}$ and $A_{j}$ perform the same (i.e., our estimation is equivalent to $U$ if $A_{i}$ and $A_{j}$ never make a draw in the $s_{i} \times s_{j}$ comparisons). This implies that (5) is naturally a metric of the overall performance of $A_{i}$ on the whole problem set $F$. The metric is consistent with our intuition that an algorithm is associated with less risk if it overall performs better than another algorithm on most problems. From the practical point of view, one main advantage of (5) is that it “normalizes” the performance of $A_{i}$ on $f_{k}$'s in terms of probabilities, so that they can be averaged without biasing to any specific $f_{k}$. If directly averaging the solution quality over different problems, those problems whose objective functions have larger magnitudes will dominate those with much smaller magnitudes.

It is important to note that the meaning of the term “risk” is rather vague, and a precise definition might not exist. The criterion proposed in this section should be used within the PAP framework only, as we have done in our empirical studies. When evaluating algorithms, it should be used together with other traditional performance metrics in order to draw a complete picture of the behavior of algorithms.

SECTION IV

## EXPERIMENTAL STUDIES

In this section, the effectiveness of PAP is empirically evaluated on 27 benchmark functions. Four algorithms, namely SaNSDE, wPSO, G3PCX, and CMA-ES were employed as the basic constituent algorithms. These algorithms can be used to implement six instantiation of PAP with two distinct constituent algorithms, four instantiations of PAP with three constituent algorithms, and one instantiation of PAP with all four algorithms. To fully evaluate PAP's potential as a general framework, we carried out experiments with all 11 instantiations. They were compared to the four constituent algorithms alone to verify whether there would be any advantages of PAP over its constituent algorithms. We considered a fixed population for each instantiation of PAP. For this reason, the restart CMA-ES with increasing population size (IPOP-CMA-ES or G-CMA-ES) [13], which is an improved version of CMA-ES, was not investigated under the framework of PAP, because it adopts a dynamic population size. However, G-CMA-ES appears to be one of the state-of-the-art algorithms for numerical optimization (according to the technical report of CEC2005 competition [14]). Hence, comparison has also been made between the PAP instantiations and G-CMA-ES. To evaluate the effect of fine-tuning the migration parameters (i.e., $migration\_{}size$ and $migration\_{}interval$) of PAP, we carried out a sensitivity analysis involving 16 different settings of the parameters. Therefore, our experimental study altogether involved running 11×16 (the PAP instantiations with different migration parameters) + 4 (the basic algorithms) + 1 (G-CMA-ES) = 181 algorithms on 27 benchmark functions.

### A. Problem Set

The first 13 functions were selected from the classical benchmark functions used in [11], here denoted as $f_{1} -f_{13}$. The other 14 functions were selected from the benchmark functions of the special session on real-parameter optimization of the 2005 IEEE Congress on Evolutionary Computation (CEC2005) [12], denoted as $f_{cec1} - f_{cec14}$. These 27 functions span a diverse set of problem features, such as multimodality, ruggedness, ill-conditioning, interdependency, etc. They provided an ideal platform for our investigation on reducing risk on a large variety of problems. Short descriptions of these functions are presented in Tables I and II. More details of these functions can be found in [11] and [12]. In our experiments, all the functions were solved in 30 dimensions.

TABLE I CLASSICAL TEST FUNCTIONS USED IN THIS PAPER, INCLUDING A SHORT DESCRIPTION OF THEIR CHARACTERISTICS
TABLE II CEC2005 TEST FUNCTIONS USED IN THIS PAPER, INCLUDING A SHORT DESCRIPTION OF THEIR CHARACTERISTICS

### B. Experimental Settings

All the results presented in this paper were obtained by executing 30 independent runs for each experiment. Since we expect the PAP framework to be general enough so that alternative algorithms can be incorporated with little effort, it should not rely much on the refinement of the constituent algorithms. Hence, we did not fine-tune the parameters of the constituent algorithms to fit PAP. When implementing SaNSDE, we used all the parameter settings suggested in the original publication [7]. As suggested in [8], a linearly decreasing inertia weight over the course of the search is employed in our implementation of wPSO. The two coefficients of wPSO were both set to 1.49445. We assumed the researchers who proposed G3PCX and CMA-ES are at the best position to implement the two algorithms and fine-tune the parameters. Hence, we simply used the source code of G3PCX and CMA-ES provided by their authors (the codes are available online), and adopted the parameters suggested in the corresponding publications [9], [10]. There exist a few variants of PCX operator. As suggested in [9], we chose the variant which employs the best individual in the population as the main parent for generating offspring. Furthermore, G3PCX and CMA-ES will terminate when some conditions are met. It is possible that they terminate before using up the allowed FEs. The remaining FEs are assigned to the other constituent algorithms. To make a fair comparison, when running G3PCX and CMA-ES alone, we restarted them in case the termination condition was met before the allowed FEs were used up.

For all the algorithms, the maximum number of FEs $(MAX\_{}FES)$ was set to 300000. All the PAP instantiations worked on a population of size 100, except for the one incorporating wPSO and CMA-ES. The population size of this instantiation was set to 50. The reason is that a PSO-type algorithm typically utilized a population size around 30 in the literature [38], [39], while 14 was recommended as the optimal population size for CMA-ES on 30-D problems [10], [37]. Hence, setting the combination of wPSO and CMA-ES to work on altogether 50 individuals is more consistent with the literature. When allocating FEs to the constituent algorithms, we simply followed a few ad hoc rules obtained from the original publications of the corresponding constituent algorithms. First, the population size of CMA-ES was set to 14 throughout the experiments. Second, a population of size 100 was most commonly adopted for problems of dimensionality 30 in the literature of DE [3], [7], while a population size smaller than 50 may lead to significant deterioration of performance. Hence, we always maintained a population size larger than or equal to 50 for SaNSDE. Third, since most studies on PSO utilized a population size of 20, 30 or 40, we kept the population size of wPSO at this level. Finally, all the remaining FEs were allocated to G3PCX. Table III presents the detailed information of the population/subpopulation sizes for all 11 PAP instantiations. In the table, SaNSDE, wPSO, G3PCX, and CMA-ES are denoted by DE, PSO, PCX, and ES for the sake of brevity. When running the basic algorithms alone, the population sizes of SaNSDE, wPSO, G3PCX, and CMA-ES were set to 100, 40, 100, and 14, respectively. The initial population size of G-CMA-ES was set to 14, and doubled for every restart.

TABLE III SIZES OF SUBPOPULATIONS ALLOCATED TO THE CONSTITUENT ALGORITHMS OF EACH INSTANTIATION OF PAP

In regard to the $migration\_{}size$ and $migration\_{}interval$, we considered four values for each of them and thus 16 pairs of parameters were tested. The $migration\_{}size$ was set to 1, 2, 4, or 8. The $migration\_{}interval$ was set to the maximum generation $(MAX\_{}GEN)$ divided by 20, 30, 40, or 50. After some preliminary experiments, we found that $migration\_{}size=1$ and $migration\_{}interval=MAX\_{}GEN/20$ generally worked well for all 11 instantiations, so they are recommended as the default values. For the sake of brevity, only the results obtained with this setting will be presented for comparison between PAP and other algorithms. Results obtained with other settings were mainly used for the sensitivity analysis.

### C. Experimental Results

Tables IVVI present the results (in terms of solution quality)1 obtained by the 16 algorithms in 30 independent runs. For each algorithm, the best, median, and worst results are given. According to the IEEE Standard for Floating-Point Arithmetic (IEEE 754), the total precision of the double precision floating-point format is 53 bits (approximately 16 decimal digits). Hence, directly carrying out further analyses (e.g., statistical tests) on the output of the program may introduce errors caused by the precision threshold of our computer. To deal with this issue, two solutions will be regarded as the same if the quality of them are both smaller than some predefined value-to-reach. By manually checking the results that each algorithm obtained in each run, we found that $1{\rm e}\hbox{--}{13}$ was the smallest value-to-reach that would not introduce arithmetic errors (as defined by the computer) into our further analyses. Therefore, we preprocessed the entries of Tables IVVI with this value-to-reach, i.e., if an output of our computer program was smaller than $1{\rm e}\hbox{--}{13}$, we replaced it with 0.0. All analyses presented in the rest of this paper were also conducted based on the value-to-reach of $1{\rm e}\hbox{--}{\rm 13}$.

TABLE IV FUNCTION ERROR VALUES OF THE SOLUTIONS OBTAINED BY SANSDE, wPSO, G3PCX, CMA-ES, G-CMA-ES AND THE 11 PAP INSTANTIATIONS ON $f_{1}$ to $f_{9}(D=30)$
TABLE V FUNCTION ERROR VALUES OF THE SOLUTIONS OBTAINED BY SANSDE, wPSO, G3PCX, CMA-ES, G-CMA-ES AND THE 11 PAP INSTANTIATIONS ON $f_{9}$ to $f_{13}$ and $f_{cec1}$ to $f_{cec5}(D=30)$
TABLE VI FUNCTION ERROR VALUES OF THE SOLUTIONS OBTAINED BY SANSDE, wPSO, G3PCX, CMA-ES, G-CMA-ES AND THE 11 PAP INSTANTIATIONS ON $f_{cec5}$ to $f_{cec1}(D=30)$

Two-sided Wilcoxon rank-sum tests with significance level 0.05 have been conducted to compare each PAP instantiation with its constituent algorithms and G-CMA-ES. Additional tests have also been conducted with two arbitrarily chosen values-to-reach, $1{\rm e}\hbox{--}{06}$ and $1{\rm e}\hbox{--}{02}$, to verify whether different values-to-reach lead to different conclusions. Table VII summarizes the results over the 27 test functions. By comparing the PAP instantiations to their constituent algorithms, we found that PAP outperformed its constituent algorithms in most cases. Taking the results obtained with values-to-reach $1e\hbox{--}{13}$ as an example, negative results were observed only in four cases: wPSO+G3PCX, PSO+CMA-ES, G3PCX+CMA-ES, and wPSO+G3PCX+CMA-ES. A more careful examination on the detailed results of Wilcoxon tests revealed that wPSO and G3PCX were generally inferior to the other two basic algorithms and on quite a few functions both of them performed poorly. We found that wPSO and G3PCX outperformed CMA-ES on only five ($f_{7}$, $f_{8}$, $f_{cec4}$, $f_{cec9}$, $f_{cec14}$) and two functions ($f_{8}$, $f_{cec14}$), respectively. In other words, the constituent algorithms in these four cases are not complementary. Hence, the success of PAP does depend on some kind of synergy between its constituent algorithms. The above observations also hold in cases of setting value-to-reach to $1{\rm e}\hbox{--}{02}$ and $1e\hbox{--}{06}$, and thereby support our expectation that PAP is capable of finding better solutions than its constituent algorithms. Moreover, looking at the last column of Table VII, the instantiations of PAP even showed competitive performance in comparison with G-CMA-ES. Particular attention was brought to the combination of SaNSDE and G3PCX, which slightly outperformed G-CMA-ES. Since this instantiation does not take any advantage of CMA-ES, it clearly demonstrated that a combination of some relatively “weak” algorithms can be stronger than a state-of-the-art algorithm.

TABLE VII COMPARISON BETWEEN PAP INSTANTIATIONS AND THEIR CONSTITUENT ALGORITHMS AND G-CMA-ES ON THE 27 TEST FUNCTIONS (TWO-SIDED WILCOXON RANK-SUM TEST WITH SIGNIFICANCE LEVEL 0.05 WAS USED)

When assessing the algorithms' risk, this can be done by estimating the probability that one algorithm outperformed the other on the 27 functions. Since calculating the risk metric involves comparing the quality of two solutions, we also implemented it with the value-to-reach of $1{\rm e}\hbox{--}{13}$, $1{\rm e}\hbox{--}{06}$, and $1{\rm e}\hbox{--}{\rm 02}$. The calculated probabilities are presented in Table VIII. Again, we compared the PAP instantiations to their constituent algorithms as well as G-CMA-ES. In each cell, the first number is the estimated probability of the PAP instantiation outperforming the corresponding constituent algorithm, and the second number is the estimated probability of the constituent algorithm outperforming the corresponding PAP instantiation. For brevity, we omitted the probability that the two algorithms made a draw. If the first number is larger than the second one, we may say that the PAP is associated with less risk than the compared algorithm. For example, when comparing the PAP instantiation that combines SaNSDE with wPSO to SaNSDE, the statistic “0.30–0.24” indicates that PAP outperformed SaNSDE with a probability 0.30, while was outperformed by SaNSDE with a probability 0.24. From Table VIII, similar patterns can be observed in case of all three precision levels. First, most PAP instantiations were associated with less risks in comparison with their constituent algorithms. It was not surprising to discover negative results on the combinations between G3PCX, wPSO, and CMA-ES, since CMA-ES is superior or comparable to the other two algorithms on almost all functions. Second, many PAP instantiations, such as the combination of SaNSDE and CMA-ES and the combination of all four basic algorithms, were associated with less risks than G-CMA-ES. Hence, the efficacy of the PAP framework with respect to risk reduction has been experimentally shown.

TABLE VIII COMPARISON BETWEEN PAP INSTANTIATIONS AND THEIR CONSTITUENT ALGORITHMS AND G-CMA-ES IN TERMS OF RISK (USING THE METRIC PROPOSED IN Section III)

Although the results presented in Tables VII and VIII summarize the performance of PAP over the whole set of benchmark functions, these might not be sufficient to draw a complete picture of the performance of PAP. Taking a closer look at the evolutionary process of PAP on individual functions will give more insight into PAP's behaviors. Hence, we further carried out case studies on three selected functions, including $f_{9}$ (generalized Rastrigin's function), $f_{2}$ (Schwefel's problem 2.22), and $f_{cec12}$ (Schwefel's problem 2.13). Two PAP instantiations were considered in the case studies. First, we considered a PAP with only two constituent algorithms. This could make it easier to observe the relationship between the behaviors of PAP and its constituent algorithms. Since SaNSDE and CMA-ES generally performed better than wPSO and G3PCX, the PAP with SaNSDE and CMA-ES was a good choice for our study. The PAP with four constituent algorithms was also considered, because it incorporates the most constituent algorithms and outperformed all of them as well as G-CMA-ES, as shown in both Tables VII and VIII.

Figs. 24 present the evolutionary curves of the PAP with CMA-ES and SaNSDE on the three functions. For each algorithm, we sorted the 30 runs by the quality of the final solutions and picked out the median ones. The corresponding evolutionary curve was then plotted with respect to the value-to-reach $1{\rm e}\hbox{--}{13}$ (since semi-log graphs were plotted here, all values smaller than this number were set to $1{\rm e}\hbox{--}{13}$ rather than 0 before being plotted). Three different scenarios can be observed from the figures. First, CMA-ES converged rapidly on $f_{9}$, but the solution obtained was not good. On the other hand, SaNSDE progressed much slower than CMA-ES, but continuously found better solutions and eventually arrived at the value-to-reach. By combining the two basic algorithms, the PAP progressed very fast at the beginning, which probably should be credited to CMA-ES. After that, it stagnated for a while. Once SaNSDE obtained a sufficiently good solution, the PAP started progressing again, and finally reached the value-to-reach as well. Note that the PAP always evolved faster than SaNSDE during the search. This illustrated that the PAP managed to take advantage of both CMA-ES and SaNSDE, and thus accelerated the evolutionary process. The scenario showed by Fig. 3 is similar to that of Fig. 2. In this case, CMA-ES converged faster than SaNSDE on $f_{2}$, while SaNSDE obtained a better solution. PAP always obtained better or comparable solutions in comparison with SaNSDE throughout the evolutionary process. Finally, Fig. 4 provides a negative case in which PAP did not achieve the best performance. On $f_{cec12}$, CMA-ES not only converged faster than SaNSDE, but also consistently obtained better solutions. Hence, there is hardly any advantage that the PAP could take from SaNSDE, and assigning computation resources to it was probably a waste. Moreover, we found that CMA-ES evolved very fast at the early stage and then stagnated for a long time. At the late stage of evolution, the solution quality was improved with a sudden jump. Such an improvement was achieved after restarting CMA-ES for many times. Since the PAP assigned a lot of computational resource to SaNSDE, it could not afford so many restarts for CMA-ES. In consequence, the PAP was inferior to CMA-ES both in terms of solution quality and convergence speed in this case. Nevertheless, the PAP always outperformed SaNSDE. Thus, employing PAP at least alleviated the risk of selecting the wrong algorithm.

Fig. 2. Evolutionary process of DE+ES, DE, and ES on function $f_{9}$. DE and ES stand for SaNSDE and CMA-ES, respectively. The value-to-reach was set to $1{\rm e}\hbox{--}{\rm 13}$.
Fig. 3. Evolutionary process of DE+ES, DE, and ES on function $f_{2}$. DE and ES stand for SaNSDE and CMA-ES, respectively. The value-to-reach was set to $1{\rm e}\hbox{--}{\rm 13}$.
Fig. 4. Evolutionary process of DE+ES, DE, and ES on function $f_{cec12}$. DE and ES stand for SaNSDE and CMA-ES, respectively. The value-to-reach was set to $1{\rm e}\hbox{--}{\rm 13}$.

Figs. 57 present the evolutionary process of the PAP with 4 constituent algorithms. The three figures essentially tell the same story as Figs. 24, and hence verified previous analyses on PAP with more than two constituent algorithms.

Fig. 5. Evolutionary process of DE+PSO+PCX+ES and its constituentalgorithms on function $f_{9}$. DE, PSO, PCX, and ES stand for SaNSDE, wPSO, G3PCX, and CMA-ES, respectively. The value-to-reach was set to $1{\rm e}\hbox{--}{\rm 13}$.
Fig. 6. Evolutionary process of DE+PSO+PCX+ES and its constituent algorithms on function $f_{2}$. DE, PSO, PCX, and ES stand for SaNSDE, wPSO, G3PCX, and CMA-ES, respectively. The value-to-reach was set to $1{\rm e}\hbox{--}{\rm 13}$.
Fig. 7. Evolutionary process of DE+PSO+PCX+ES and its constituentalgorithms on function $f_{cec12}$. DE, PSO, PCX, and ES stand for SaNSDE, wPSO, G3PCX, and CMA-ES, respectively. The value-to-reach was set to $1{\rm e}\hbox{--}{\rm 13}$.

Two conclusions regarding the migration scheme of PAP can be drawn from the superiority of PAP over the compared algorithms. First, the migration scheme is of great importance to the success of PAP. Second, 1 and $MAX\_{}GEN/20$ are two appropriate and robust values for the parameters $migration\_{}size$ and $migration\_{}interval$. To further investigate the influence of these parameters on PAP, we carried out a sensitivity analysis to check whether the performance of PAP will change significantly with other parameter settings. As stated in Section VI-B, 16 different pairs of $migration\_{}interval$ and $migration\_{}size$ were tested for every instantiation of PAP. For each pair, 30 independent runs were executed on all 27 benchmark functions. Then, for each instantiation on each function, Kruskal-Wallis one-way analysis of variance by ranks was employed to test whether the 16 pairs of parameters had led to significantly different performance. After that, for each instantiation of PAP, we counted the number of the benchmark functions on which all 16 pairs of parameters made no difference. The larger the number, the more insensitive an instantiation is to the parameters. For the sake of brevity, we only summarize in Table IX these numbers for the 11 PAP instantiations, while omit the full details. It can be observed that, in the worst case (SaNSDE+wPSO+G3PCX), the PAP instantiation is insensitive to the migration parameters on 16 out of 27 functions.

TABLE IX NUMBER OF FUNCTIONS ON WHICH 16 DIFFERENT SETTINGS OF THE MIGRATION PARAMETERS DID NOT RESULT IN STATISTICALLY DIFFERENT PERFORMANCE (IN TERMS OF SOLUTION QUALITY) OF THE PAP INSTANTIATIONS
SECTION V

## FURTHER ANALYSIS

The previous section empirically verified the efficacy of PAP. However, the reason why PAP performed well is still not fully understood. In Section II-C, we suggested that the constituent algorithms of PAP should be carefully chosen so that they are complementary. Is this really the key issue? Can we achieve comparable performance by simply running the same algorithm in parallel?

### A. Compare PAP With Parallel EAs

To answer the above question, we carried out an additional experiment to compare PAP with parallel EAs. The PAP with four constituent algorithms was chosen as the representative PAP instantiation in this experiment. The comparison was made between this PAP instantiation and the parallel version of all its four constituent algorithms, i.e., parallel SaNSDE (PDE), parallel wPSO (PPSO), parallel G3PCX (PPCX) and parallel CMA-ES (PES). All four parallel EAs were run on the same benchmark functions described in Section IV with $300\,000$ FEs. Each parallel EAs maintained four subpopulations. For PDE, PPSO, PPCX, and PES, each subpopulation consisted of 25, 10, 26, and 14 individuals, respectively. According to previous experimental results, $migration\_{}size$ and $migration\_{}interval$ were set to 1 and $MAX\_{}GEN/20$ throughout the experiment.

Table X summarizes the comparison between PAP and the four parallel EAs on the 27 functions. The comparison was made based on 30 independent runs using two-sided Wilcoxon rank-sum tests and the risk metric. It can be observed that PAP outperformed all the compared parallel EAs. Hence, employing different constituent algorithms in PAP indeed boosted its performance.

TABLE X COMPARISON BETWEEN THE PAP INSTANTIATION WITH FOUR CONSTITUENT ALGORITHMS AND PARALLEL EAs

### B. On Choosing Constituent Algorithms for PAP

The previous section further demonstrated that the advantage of PAP was not due to parallelization, but to the use of different constituent algorithms. However, we also observed from Section IV that not all 11 PAP instantiations performed comparably. This difference in performance apparently lies in the different constituent algorithms they employed. Now the question is which type of algorithms we should use to implement a PAP. In Section II-C, we suggested that the constituent algorithms should be complementary to each another. The following analysis attempts to elaborate more on this issue.

We start from the case of two constituent algorithms, say $A_{1}$ and $A_{2}$. Our question is, how we can properly choose $A_{1}$ and $A_{2}$ so that the PAP is associated with less risk in comparison to another algorithm $A^{\ast}$. Here, $A^{\ast}$ can be any algorithm, including $A_{1}$ and $A_{2}$.2Since the selection of $A_{1}$ and $A_{2}$ is irrelevant to $A^{\ast}$, seeking a PAP that is less risky than $A^{\ast}$ can be carried out by minimizing the probability that the PAP obtains a worse solution than $A^{\ast}$ over a problem set. We further assume that the PAP runs $A_{1}$ and $A_{2}$ independently (i.e., the migration scheme is omitted), and let $P_{1,k}$ and $P_{2,k}$ be the probabilities that $A_{1}$ and $A_{2}$ obtain a better solution than $A^{\ast}$. According to (5) and (6), the minimization problem takes the following form: TeX Source $${\rm min} R= {{1}\over {n}}\sum _{k=1}^{n}(1-P_{1,k})(1-P_{2,k})\eqno{\hbox{(8)}}$$ where $k$ is the index of problems. With some simple derivation, we can find that TeX Source \eqalignno{& {{1}\over {n}}\sum_{k=1}^{n}(1-P_{1,k})(1-P_{2,k}) \cr&\qquad =1+ {{1}\over {n}}\sum_{k=1}^{n} P_{1,k}P_{2,k}-\bar {P_{1}}-\bar {P_{2}}\cr&\qquad =1-\bar{P_{1}}-\bar {P_{2}}+ {{1}\over {n}}\sum _{k=1}^{n}(P_{1,k}-\bar {P_{1}})(P_{2,k}-\bar{P_{2}}) \cr&\qquad + {{1}\over {n}}\bar {P_{1}}\sum _{k=1}^{n}P_{2,k}+ {{1}\over {n}}\bar{P_{2}}\sum _{k=1}^{n} P_{1,k}-\bar {P_{1}}\bar {P_{2}}\cr &\qquad=1-\bar {P_{1}}-\bar {P_{2}}+\bar {P_{1}}\bar {P_{2}}+ {{1}\over{n}}\sum _{k=1}^{n}(P_{1,k}-\bar {P_{1}})(P_{2,k}-\bar {P_{2}})\cr &\qquad =(1-\bar {P_{1}})(1-\bar {P_{2}})+ {{1}\over {n}}\sum_{k=1}^{n} (P_{1,k}-\bar{P_{1}})(P_{2,k}-\bar{P_{2}})&{\hbox{(9)}}} where $\bar {P_{1}}= {{1}\over {n}}\sum _{k=1}^{n} P_{1,k}$ and $\bar {P_{2}}= {{1}\over {n}}\sum _{k=1}^{n} P_{2,k}$.

Taking a closer look at the above derivations, we can find that larger $\bar {P_{1}}$ and $\bar {P_{2}}$ generally lead to a smaller $R$, which means $A_{1}$ and $A_{2}$ should perform sufficiently well on the problem set. Meanwhile, the term $\sum _{k=1}^{n}(P_{1,k}-\bar {P_{1}})(P_{2,k}-\bar {P_{2}})$ needs to be as small as possible. This observation implies that we hope $(P_{1,k}-\bar {P_{1}})$ to be negative if $(P_{2,k}-\bar {P_{2}})$ is positive and vice versa. More precisely, the performance of $A_{1}$ on a problem is desired to be above its “average” performance over the problem set when the performance of $A_{2}$ is below its “average” performance and vice versa. This elaborates the term “complementary” that has been briefly mentioned in Section II-C.

The above analysis can be generalized to more than two constituent algorithms by considering the algorithm selection process in a sequential manner. For the $m$-algorithms case, one first identifies two algorithms $A_{1}$ and $A_{2}$ as a basis, followed by seeking the remaining algorithms one by one. Suppose we have decided $i$ algorithms and now need to identify an additional algorithm, say $A_{i+1}$. We may regard the combination of all the previously selected algorithms as a single algorithm $A_{c}$, a potentially good $A_{i+1}$ should then not only be good by itself, but also be complementary to $A_{c}$. Viewing the $m$-algorithms in a sequential manner also helps us understanding the limit of PAP. Specifically, when there are only two constituent algorithms, one can easily identify two complementary algorithms that both generally perform well. As the number of algorithms increases, the performance of PAP is expected to improve as long as new good constituent algorithms can be included. However, such improvement makes it more and more difficult to identify the next satisfactory constituent algorithm. Therefore, the performance of PAP will eventually start decreasing because of the inclusion of some undesirable constituent algorithms. In other words, there must be some kind of “upper bound” of PAP's performance improvement. A rigorous analysis along this line is nontrivial, and deserves in-depth investigation in the future. Moreover, we also need to analyze the effect of migration in the future.

Although our theoretical analysis has been simplified, it actually explains the experimental results quite well. Take SaNSDE and CMA-ES as examples, both exhibit better performance than G3PCX and wPSO. According to the Wilcoxon test, we found that SaNSDE achieved significantly better solutions than CMA-ES on $f_{7}$, $f_{8}$, $f_{9}$, $f_{cec4}$, $f_{cec5}$, $f_{cec9}$, and $f_{cec14}$, while CMA-ES performed much better than SaNSDE on $f_{cec3}$, $f_{cec7}$, $f_{cec8}$, $f_{cec10}$, $f_{cec11}$, $f_{cec12}$. In other words, the two algorithms favor quite different functions. When employing them as the constituent algorithms, the resulting PAP not only beat both SaNSDE and CMA-ES themselves, but also outperformed G-CMA-ES on 10 out of 27 functions. In contrast, the PAP instantiation using wPSO and G3PCX appeared to be an obvious negative example. G3PCX and wPSO concurrently failed on many functions, e.g., $f_{9}$, $f_{11}$, $f_{cec1}$, $f_{cec2}$, $f_{cec5}$, $f_{cec6}$, $f_{cec10}$, and $f_{cec13}$. Consequently, the PAP instantiation employing these two algorithms was not even as good as wPSO. The above two examples demonstrate that our theoretical analysis, at least to some extent, can be used to explain experimental results.

### C. Can PAP Increase the Probability of Finding the Global Optimum?

So far, all the experimental studies compared the quality of the solutions obtained by an algorithm. The attractive performance showed by PAP essentially indicated that PAP managed to get good solutions within a given time budget. However, a good solution might be a local optimum. Hence, the evidence shown in previous sections did not necessarily show that PAP is capable of improving the probability of finding the global optimum. Investigating PAP from this perspective will provide us with a more comprehensive understanding of its characteristics. In practice, we may measure the probability of finding a solution that reaches the precision threshold of the computer hardware. In the context of this paper, this was done by calculating the rates of an algorithm reaching the value-to-reach $1{\rm e}\hbox{--}{13}$ in 30 independent runs.

Eight algorithms, including three PAP instantiations, the four basic algorithms and G-CMA-ES were used in this investigation. The PAP instantiation employing SaNSDE, wPSO, and CMA-ES as its constituent algorithms was chosen as the representative PAP with three constituent algorithms. The PAP instantiation with SaNSDE and CMA-ES was chosen as the representative PAP with two constituent algorithms. In addition, we also considered the PAP instantiation with four constituent algorithms.

Table XI presents the rates of the algorithms attaining the value-to-reach in 30 independent runs. These rates were calculated based on results of the experiments described in Section IV. It can be observed that the PAP instantiation with four constituent algorithms was overall the best among the compared algorithms. It achieved the highest average rate of 0.57. The average rates of the other two PAP instantiations were both 0.52, while the average rates of SaNSDE, wPSO, G3PCX, CMA-ES, and G-CMA-ES were 0.42, 0.26, 0.15, 0.45, and 0.46. In comparison with their constituent algorithms, the PAP instantiations achieved higher or equal rates on 23, 24, and 23 functions, respectively. Therefore, the efficacy of PAP was again demonstrated.

TABLE XI COMPARISON AMONG THREE PAP INSTANTIATIONS, THE FOUR BASIC ALGORITHMS AND G-CMA-ES IN TERMS OF PROBABILITY OF REACHING THE VALUE-to-REACH OF 1e-13
SECTION VI

## CONCLUSION

This paper investigated solving numerical optimization problems for which solutions must be presented within a limited time budget. Although numerous algorithms are readily applicable for this type of problems, their performance usually varies significantly from problem to problem. This implies that there is an inherent risk associated with the selection of an algorithm. Unfortunately, identifying a suitable (or optimal) algorithm for a specific problem is a nontrivial task due to the lack of prior knowledge. The limited time budget also prohibits us from trying out different algorithms and then choosing the best one. Instead of betting the entire time budget on a single algorithm, we proposed that such a risk can be reduced by distributing the time budget to multiple algorithms. Based on this idea, a general framework called PAP has been proposed in the context of population-based search algorithms. PAP typically consists of a number of constituent algorithms, each of which is allowed to run with a portion of the time budget. Allocation of computation time is implemented by dividing the whole population into a number of subpopulations, and maintaining one for each constituent algorithm. To further boost the performance, interaction among constituent algorithms is carried out through regularly migrating individuals among the subpopulations. We proposed a pairwise metric to compare the risks associated with two algorithms. Such a metric can be used to evaluate how effective our PAP is, together with other common metrics. Given a set of functions, the proposed metric essentially measures how likely it is that an algorithm will find a better solution than another algorithm by the end of a given time budget.

To evaluate the effectiveness of PAP, 11 instantiations of PAP were implemented based on four existing constituent algorithms, including SaNSDE, wPSO, G3PCX, and CMA-ES. The performance of each instantiation was compared to its constituent algorithms on 27 benchmark functions. Our experimental results showed that seven out of the 11 PAP instantiations outperformed their constituent algorithms in terms of solution quality and the proposed risk metric. Furthermore, 7 out of the 11 instantiations even achieved superior or comparable performance in comparison with G-CMA-ES, which was known to be superior to any of the four constituent algorithms. Our empirical studies also revealed that PAP is capable of increasing the probability of finding the global optimum and is insensitive to control parameters of the migration scheme. Further analyses have been conducted to investigate in what circumstance PAP may outperform its constituent algorithms. Complementarity was identified as a key issue.

Though PAP has been shown to be a promising framework, the resource (time) allocation strategy utilized in this paper was somewhat ad hoc: We manually allocated the resource before running the PAP. When the optimization task becomes tougher, either in terms of the inherent difficulty of the problem or in terms of an extremely limited time budget, fixed allocation of computational resources might be too rigid to guarantee good PAP performance. Therefore, an adaptive allocation strategy deserves further investigation. We will address this issue in the future.

### ACKNOWLEDGMENT

The authors would like to thank Dr. T. Weise for proofreading the manuscript.

## Footnotes

This paper was partially supported by the National Natural Science Foundation of China under Grants 60533020, 60802036 and U0835002, the Fund for Foreign Scholars in University Research and Teaching Programs in China under Grant B07033, and the Engineering and Physical Science Research Council in U.K. under Grant EP/D052785/1 on “SEBASE: Software Engineering By Automated Search.”.

F. Peng and K. Tang are with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China (e-mail: pzbhlp@mail.ustc.edu.cn, ketang@ustc.edu.cn).

G. Chen is with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China. He is also with the National High-Performance Computing Center, Hefei 230027, China (e-mail: glchen@ustc.edu.cn).

X. Yao is with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei 230027, China. He is also with the Center of Excellence for Research in Computational Intelligence and Applications, School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K. (e-mail: x.yao@cs.bham.ac.uk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

1All test functions used in this paper are minimization problems. Thus, we measure the quality of a solution ${\bf x}$ via its function error value $f({\bf x})-f({\bf x}^{\ast})$, where ${\bf x}^{\ast}$ is the global optimum of $f$. When discussing the solution quality of an algorithm, we refer to the quality of the best solution (i.e., the final solution) obtained by the algorithm.

2If we cast all the computation time on $A_{1}$ (or $A_{2}$), in most cases, the solution obtained will be better than the solution obtained by assigning only a part of time budget to it (as we do in PAP). Hence, the $A_{1} (A_{2})$ in the two scenarios can be regarded as two algorithms.

## References

No Data Available

## Cited By

No Data Available

None

## Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available