By Topic

IEEE Quick Preview
  • Abstract

SECTION I

INTRODUCTION

ESTIMATION of distribution algorithms (EDAs) [25], [28] are population-based stochastic algorithms that incorporate learning into optimization. Unlike evolutionary algorithms (EAs) that rely on variation operators to produce offspring, EDAs create offspring through sampling a probabilistic model that has been learned so far in the optimization process. Obviously, the performance of an EDA depends on how well we have learned the probabilistic model that tries to estimate the distribution of the optimal solutions. The general procedure of EDAs can be summarized in Table I. In recent years, many variants of EDAs have been proposed. On one hand, they have been shown experimentally to outperform other existing algorithms on many benchmark test functions. On the other hand, there were also experimental observations that showed EDAs did not scale well to large problems. In spite of a large number of experimental studies, theoretical analysis of EDAs has been few, especially on the computational time complexity of EDAs.

The importance of the time complexity of EDAs was recognized by several researchers. Mühlenbein and Schlierkamp-Voosen [31] studied the convergence time of constant selection intensity algorithms on the ONEMax function. Later, Mühlenbein [27] studied the response to selection equation of the univariate marginal distribution algorithm (UMDA) on the ONEMax function through experiments as well as theoretical analysis. Pelikan et al. [32] studied the convergence time of Bayesian optimization algorithm on the ONEMax function. Rastegar and Meybodi [35] carried out a theoretical study of the global convergence time of a limit model of EDAs using drift analysis, but they did not investigate any relations between the problem size and computation time of EDAs. In addition to convergence time, the time complexity of EDAs can be measured by the first hitting time (FHT), which is defined as the first time for a stochastic optimization algorithm to reach the global optimum. Although recent work pointed out the significance of studying the FHT of EDAs [29], [33], few results have been reported. Droste's results [8] on the compact genetic algorithm (cGA) are a rare example. He analyzed rigorously the FHT of cGA with population size 2 [14] on linear functions. The other example is González's doctoral dissertation [13], where she analyzed the FHT of EDAs on the pseudo-boolean injective function using the analytical Markov chain framework proposed by He and Yao [17]. González [13] proved an important result that the worst-case mean FHT is exponential in the problem size for four commonly used EDAs. However, no specific problem was analyzed theoretically. Instead, González et al. [10] studied experimentally the mean FHT of three different types of EDAs, including the UMDA, on the Linear function, LEADINGOnes function [4], [7], [16], [37], and UNIMAX (long-path) function [22].

Table 1
TABLE I GENERAL PROCEDURE OF EDA

This paper concerns theoretical analysis of the FHT of EDAs on the optimization problems with a unique global optimum. First, we provide a classification of problem hardness based on the FHT of EDAs, so that we can relate the problem characteristics to EDAs. This is very important for investigating the principles of when to use which EDAs for a given problem. Given such a classification (with respect to an EDA), we then investigate the relationship between EDAs probability conditions and problem hardness. Specifically, the time complexity of a simple EDA, the UMDA with truncation selection, is analyzed on two unimodal problems. The first problem is the LEADINGOnes problem [37], which has frequently been studied in the field of time complexity analysis of EAs [7], [16], [17], [18]. The other problem is a variant of LEADINGOnes, namely BVLeadingOnes.

Our analysis can be briefly summarized from two aspects. First, we propose a general approach to time complexity analysis of EDAs with finite populations. In the domain of EDAs, lots of theoretical results are based on infinite population assumption (e.g., [3], [11], [45]), while few consider the more realistic scenario that employs finite populations. Though we restrict our analysis to UMDA, our approach may also be useful for other EDAs. Second, both LEADINGOnes and BVLeadingOnes are unimodal problems, and hence are usually expected to be easy for EDAs [11]. Our analysis confirms that LEADINGOnes is easy for the UMDA studied. However, we interestingly find that BVLeadingOnes is hard for the UMDA. To deal with this issue, we relax the UMDA by the so-called margins, and prove that BVLeadingOnes becomes easy for this relaxed version of UMDA.

The rest of the paper is organized as follows. Section II discusses why FHT is more appropriate for time complexity analysis of EDAs and presents the classification of problem hardness and the corresponding probability conditions for EDAs. Section III presents the new approach to analyzing EDAs with finite populations and describes the UMDA studied in this paper. Then, UMDA is analyzed on LEADINGOnes and BVLeadingOnes problems in Sections IV and V, respectively. Section VI studies the relaxation form of the UMDA on the BVLeadingOnes problem. Finally, Section VII concludes the paper.

SECTION II

TIME COMPLEXITY MEASURES FOR EDAS

A. How to Measure the Time Complexity of EDAs

The concept of “convergence” is often used to measure the limit behaviors of EAs, including EDAs, which was derived from the concept of convergence of random sequences [37]. For EDAs, the following formal definition of “convergence” was given by Zhang and Mühlenbein [45]:

If Formula$\lim _{t\to \infty }\bar {F}(t)=g^{\ast }$ holds for a given EDA, where Formula$\bar {F}(t)$ is the average fitness of individuals in the Formula$t$th generation and Formula$g^{\ast }$ is the fitness of the global optimum, then we say that the EDA converges to the global optimum.

There has been some work concerning such convergence of EDAs [12], [30]. It is worth noting that the above definition of convergence requires all individuals of a population to reach the global optimum. If we assume that an EDA on a problem converges to the global optimum, we can then measure the EDAs time complexity using the minimal number of generations that is needed for it to converge. This concept is called the convergence time (CT), denoted by Formula$T$ in this paper. For EDAs, the CT is formally defined by Formula TeX Source $$T \triangleq {{\min}}\left\{t;p\left(x^{\ast }\vert \xi _{t}^{(s)}\right)=1\right\}\eqno{\hbox{(1)}}$$ where Formula$x^{\ast }$ is the global optimum of a given problem, and Formula$\xi _{t}^{(s)}$ is the population after selection at the Formula$t$th generation. Formula$p\left(x^{\ast }\vert \xi _{t}^{(s)}\right)$ is the estimated probability (of generating Formula$x^{\ast }$) by the EDA at the Formula$t$th generation.

In addition to CT, the FHT is also a commonly used concept for measuring the time complexity of EAs [16], [17]. The FHT [16], [17], [43], denoted by Formula$\tau$, is defined for the general procedure of EDA shown in Table I Formula TeX Source $$\tau \triangleq {{\min}} \left\{t;x^{\ast } \in \xi _{t+1}\right\}\eqno{\hbox{(2)}}$$ where Formula$\xi _{t+1}$ is the population generated at the end of Formula$t$th generation. In the domain of EA, the FHT records the smallest number of generations needed to find the optimum, which is by a factor Formula$N$ smaller than another commonly used measure named number of fitness evaluations, where Formula$N$ is the number of fitness evaluations in every generation [9]. As González pointed out in [13], the FHT can also be used to measure the time complexity of EDAs.

Since EDAs are stochastic algorithms, both CT Formula$T$ and FHT Formula$\tau$ are random variables. Noting that the FHT measures the time for the global optimum to be found for the first time, thus the CT is no smaller than FHT Formula TeX Source $$T\geq \tau\eqno{\hbox{(3)}}$$ which implies a natural way to bound CT from below by FHT or bound FHT from above by the CT.

In practical optimization, we are most interested in the time spent in finding the global optimum, not in waiting for the whole population to converge to the global optimum. Hence, the FHT is a better measure for analyzing the time complexity of the EDAs. It is worth noting that for a given EDA on a problem, it may have a small FHT but large CT. In other words, the population may take a long time (even infinite) to converge to the global optimum. In such cases, the analysis of FHT is still valid while the analysis of CT is rather uninteresting. It is possible that an EDA could find the global optimum efficiently (in polynomial time), but the population does not converge to the global optimum. We will discuss such an example in Section VI.

B. Probability Conditions for EDA-Hardness

In order to understand better the relationship between problem characteristics and algorithmic features of an EDA, we introduce a problem classification for a given EDA. However, we should introduce some notations first.

Denote Formula$Poly(n)$ as the polynomial function class of the problem size Formula$n$ and Formula$SuperPoly(n)$ as the super-polynomial function class of the problem size Formula$n$. For a function Formula$f(n)$ (where Formula$f(n)>1$ always holds, and when Formula$n\to \infty$, Formula$f(n)\to \infty$), denote the following:

  1. Formula$f(n)\prec Poly(n)$ and Formula$g(n)= {{1}\over {f(n)}} \succ {{1}\over {Poly(n)}}$ if and only if Formula$\exists a,b\in \BBR ^{+}$, Formula$n_{0}\in \BBN$: Formula$\forall n>n_{0}$, Formula$f(n)\leq an^{b}$;
  2. Formula$f(n)\succ SuperPoly(n)$ and Formula$g(n)= {{1}\over {f(n)}} \prec {{1}\over {SuperPoly(n)}}$ if and only if Formula$\forall a,b\in \BBR ^{+}$: Formula$\exists n_{0}\in \BBN$: Formula$\forall n>n_{0}$, Formula$f(n)> an^{b}$.

Based on the above definitions, we know that “Formula$\prec$” and “Formula$\succ$” imply “Formula$\langle$” and “Formula$\rangle$” respectively, when Formula$n$ is sufficiently large. Formula$Poly(n) [SuperPoly(n)]$ implies that there exists a monotonically increasing function that is polynomial (super-polynomial) in the problem size Formula$n$. Note that Formula$g(n)= {{1}\over {f(n)}}\in (0,1)$, and its asymptotic form Formula$g(n)\succ {{1}\over {Poly(n)}}$ or Formula$g(n)\prec {{1}\over {SuperPoly(n)}}$, can be used to measure the asymptotic order of a probability (e.g., the probability of generating a certain individual), since a probability always takes its value in the interval Formula$[{0,1}]$.1 Then we provide the following problem classification for a given EDA.

  1. EDA-easy Class. For a given EDA, a problem is EDA-easy if, and only if, with the probability of Formula$1-1/SuperPoly(n)$, the FHT needed to reach the global optimum is polynomial in the problem size Formula$n$.
  2. EDA-hard Class. For a given EDA, a problem is EDA-hard if, and only if, with the probability of Formula$1/Poly(n)$, the FHT needed to reach the global optimum is super-polynomial in the problem size Formula$n$.

The above classification can be considered as a direct generalization of the following EA-hardness classification for EAs proposed by He and Yao [18].

  1. EA-easy Class. For a given EA, a problem is EA-easy if, and only if, the mean FHT needed to reach the global optimum is polynomial in the problem size Formula$n$.
  2. EA-hard Class. For a given EA, a problem is EA-hard if, and only if, the mean FHT needed to reach the global optimum is super-polynomial in the problem size Formula$n$.

We see that He and Yao's classification for EAs is based on mean FHT, while our classification for EDAs concerns more detailed characteristics of the probability distribution of FHT. Given a problem, if the FHT of an EDA is polynomial with a probability super-polynomially close to 1 (the probability will be called “an overwhelming probability” in the following parts of the paper), then we can say that in most of independent runs, the EDA can find the optimum of the problem efficiently. On the other hand, if the FHT of an EDA is super-polynomial with a probability that is polynomially large. i.e., Formula$1/Poly(n)$, then it is very likely that the EDA cannot find the optimum of the problem efficiently. A similar idea can be found in [42], which defined efficiency measures for randomized search heuristics.

From the definition of expectation in probability theory, we know that for an algorithm, the problems belonging to the EDA-hard class in our classification will still be hard under the classification based on mean FHT. But our classification defines EDA-easy differently from the classification based on mean FHT. In practice, it is possible that an EDA finds the optimum efficiently in most of the independent runs, while spends extremely long time in the other runs. This kind of problems will considered to be “hard” cases if using mean FHT for classification. However, in our classification, such problems are considered to be easy cases, which is more likely to fit the practitioners' point of view.

We now establish conditions under which a problem is EDA-hard (or EDA-easy) for a given EDA. Let Formula$\BBP (\tau =t) (t\in \BBN)$ be the probability distribution of the FHT, which is determined by the probabilistic model at the Formula$t$th generation. An EDA can be regarded as a random process Formula$K=\{K_{t}\colon t\in \BBN \}$, where Formula$K_{t}$ is the probabilistic model (including the parameters) maintained at the Formula$t$th generation. Obviously, Formula$K_{t}$ implies the probability of generating the global optimum in one sampling at the Formula$t$th generation, denoted by Formula$P_{t}^{\ast }$ Formula TeX Source $$\forall t\in \BBN\colon K_{t}\vdash P_{t}^{\ast }.\eqno{\hbox{(4)}}$$

Meanwhile, to obtain the probability distribution of the FHT Formula$\tau$, we let Formula$P_{t}^{\prime}$ be the probability of generating the global optimum in one sampling at the Formula$t$th generation, conditional on the event Formula$\tau \geq t$ (i.e., the global optimum has not been generated before the Formula$t$th generation). Consequently, we obtain the following lemma:

Lemma 1

The probability distribution of the FHT Formula$\tau$ satisfies Formula TeX Source $$\forall t\geq 0\colon \BBP(\tau=t)=\left (1-\left(1-P_{t}^{\prime }\right)^{N}\right)\prod _{j=0}^{t-1}\left(1-P_{j}^{\prime }\right)^{N}.\eqno{\hbox{(5)}}$$

Proof

Let Formula$x^{\ast }$ be the global optimum. As Table I and (2), we also let Formula$\xi _{t+1}$ be the generated population at the end of Formula$t$th generation Formula$(t\in \BBN)$. According to the FHT defined in (2), for any Formula$t\in \BBN ^{+}$ we have Formula TeX Source $$\eqalignno{& \BBP(\tau=t)= \BBP\left(x^{\ast }\in \xi_{t+1},x^{\ast }\notin \xi _{t},\ldots,x^{\ast }\notin \xi_{2},x^{\ast }\notin \xi _{1}\right)\cr& = \BBP\left(x^{\ast }\in\xi _{t+1},x^{\ast }\notin \xi _{t},\ldots,x^{\ast }\notin \xi_{2}\mid x^{\ast }\notin \xi _{1}\right)\cr& \quad \cdot\BBP\left(x^{\ast }\notin \xi _{1}\right)\cr& = \BBP\left(x^{\ast}\in \xi _{t+1},x^{\ast }\notin \xi _{t},\ldots,x^{\ast }\notin\xi _{3}\mid x^{\ast }\notin \xi _{2},x^{\ast }\notin \xi_{1}\right)\cr& \quad \cdot \BBP\left(x^{\ast }\notin \xi_{2}\mid x^{\ast }\notin \xi _{1}\right) \BBP \left(x^{\ast}\notin \xi _{1}\right)\cr& = \BBP\left(x^{\ast }\in \xi_{t+1}\mid x^{\ast }\notin \xi _{t},\ldots,x^{\ast }\notin \xi_{1}\right) \BBP \left(x^{\ast }\notin \xi _{1}\right)\cr&\quad \cdot \prod _{j=1}^{t-1} \BBP \left(x^{\ast }\notin \xi_{j+1}\mid x^{\ast }\notin \xi _{j},\ldots,x^{\ast }\notin \xi_{1}\right)\cr& = \BBP\left(x^{\ast }\in \xi _{t+1}\mid \tau \geq t\right)\prod _{j=0}^{t-1} \BBP\left(x^{\ast }\notin \xi_{j+1}\mid \tau \geq j\right)\cr& =\left (1-\left(1-P_{t}^{\prime}\right)^{N}\right)\prod _{j=0}^{t-1}\left(1-P_{j}^{\prime}\right)^{N}}$$ where Formula$N$ is the population size, the item Formula$1-\left(1-P_{t}^{\prime}\right)^{N}$ is the probability that the optimum is found at the Formula$t$th generation, conditional on the event Formula$\tau \geq t$, and the item Formula$\prod _{j=0}^{t-1}\left(1-P_{j}^{\prime}\right)^{N}$ is the probability that the optimum has not been found before the Formula$t$th generation. Combining the above result with the fact Formula$\BBP (\tau=0)=1-\left(1-P_{0}^{\prime}\right)^{N}$, we have proven the lemma. ■

Moreover, let us consider the following lemma:

Lemma 2

If Formula$\BBP (\tau \prec Poly(n))\succ 1- {{1}\over {SuperPoly(n)}}$, then Formula$\exists t^{\prime}\leq \lceil \BBE [\tau \mid \tau \prec Poly(n)]\rceil +1$ such that Formula TeX Source $$\BBP(\tau=t^{\prime })\succ {{1}\over {Poly(n)}}.$$

Proof

Assume that Formula$\forall t\leq \lceil \BBE [\tau \mid \tau \prec Poly(n)]\rceil +1$, Formula$\BBP (\tau =t)\prec {{1}\over {SuperPoly(n)}}$, then we know that Formula TeX Source $$\eqalignno{& \max \left\{\BBP(\tau=t);t\leq \lceil \BBE [\tau\mid\tau \prec Poly(n)]\rceil +1\right\}\cr& \quad \quad \quad \quad \quad \quad \quad \quad \quad \prec{{1}\over {SuperPoly(n)}}.}$$ Hence, we can obtain Formula TeX Source $$\eqalignno{& \BBP(\tau\leq \lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +1)\cr& =\sum _{t=0}^{\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +1} \BBP (\tau =t) \cr& \leq \left(\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +2\right)\cr& \quad \cdot \max \left\{\BBP(\tau=t);t\leq\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +1\right\}\cr& \prec {{Poly(n)}\over {SuperPoly(n)}}.}$$

Now we can estimate the expectation of the FHT Formula$\tau$ Formula TeX Source $$\eqalignno{& \BBE [\tau\mid \tau \prec Poly(n)]=\sum_{t=0}^{+\infty }t \BBP (\tau =t\mid \tau \prec Poly(n))\cr& =\sum_{t=0}^{Poly(n)} {{t \BBP(\tau=t,\tau \prec Poly(n))}\over {\BBP(\tau\prec Poly(n))}}\cr& =\sum _{t=0}^{Poly(n)} {{t\BBP(\tau=t)}\over {\BBP(\tau\prec Poly(n))}}\geq \sum_{t=0}^{Poly(n)}t \BBP (\tau =t)\cr& =\sum _{t=0}^{\lceil \BBE[\tau\mid \tau \prec Poly(n)]\rceil +1}t \BBP (\tau =t)\cr& \quad+\sum _{t=\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +2}^{Poly(n)}t \BBP (\tau =t)\cr& >(\lceil \BBE [\tau\mid\tau \prec Poly(n)]\rceil +2)\cr& \quad \cdot\BBP\left(Poly(n)\succ \tau >\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +1\right)\cr& =(\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +2)\biggl(\BBP\left(\tau \prec Poly(n)\right)\cr&\quad - \BBP\left(\tau \leq\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +1\right)\biggr)\cr& =(\lceil \BBE [\tau\mid \tau\prec Poly(n)]\rceil +2)\cr& \quad \cdot \left (1- {{1}\over {SuperPoly(n)}}- {{Poly(n)}\over {SuperPoly(n)}}\right)\cr& \quad\succ (\lceil \BBE [\tau\mid \tau \prec Poly(n)]\rceil +2)- {{Poly(n)}\over {SuperPoly(n)}}\cr& \quad {\,} - {{Poly(n)Poly(n)}\over {SuperPoly(n)}}.}$$ As Formula$n\to \infty$, Formula${{Poly(n)}\over {SuperPoly(n)}}\to 0$ and Formula${{Poly(n)Poly(n)}\over {SuperPoly(n)}}\to 0$. Hence, there exists a sufficiently large problem size Formula$n$ such that Formula TeX Source $$\BBE [\tau\mid \tau \prec Poly(n)]>\lceil \BBE [\tau\mid\tau \prec Poly(n)]\rceil +1\eqno{\hbox{(6)}}$$ which is an obvious contradiction. So we have proven the lemma. ■

Formally, an optimization problem can be denoted by Formula$I=(\Omega,f)$, where Formula$\Omega$ is the search space and Formula$f$ the fitness function. Following He et al. [19], we use Formula${\cal P}=(\Omega,f, {\cal A})$ to indicate an algorithm Formula${\cal A}$ on a fitness function Formula$f$ in the search space Formula$\Omega$. Let the FHT of Formula${\cal A}$ on Formula$I$ be Formula$\tau ({\cal P})$. The following theorem describes the relation between EDA-hardness and probability Formula$P_{i}^{\ast }$.

Theorem 1

For a given Formula${\cal P}$, if the population size Formula$N$ of the EDA Formula${\cal A}$ is polynomial in the problem size Formula$n$, then:

  1. if Formula$I$ is EDA-easy for Formula${\cal A}$, then Formula$\exists t^{\prime \prime}\leq \lceil \BBE [\tau ({\cal P})\mid \tau ({\cal P})\prec Poly(n)]\rceil +1$ such that Formula TeX Source $$P_{t^{\prime \prime}}^{\ast }\succ {{1}\over {Poly(n)}};$$
  2. if Formula$\forall t=t(n)\prec Poly(n)$, Formula$P_{t}^{\ast }\prec {{1}\over {SuperPoly(n)}},$ then Formula$I$ is EDA-hard for Formula${\cal A}$.
Proof

Note that the second part of this theorem is a corollary of the first part. We only need to prove the first part.

According to Lemma 1, we have Formula TeX Source $$\BBP(\tau({\cal P})=i)< 1-\left(1-P_{i}^{\prime }\right)^{N}.$$ On the other hand, according to Lemma 2, we know that Formula$\exists t^{\prime}\leq \lceil \BBE [\tau ({\cal P})\mid \tau ({\cal P})\prec Poly(n)]\rceil +1$ such that Formula TeX Source $$\BBP(\tau({\cal P})=t^{\prime })\succ {{1}\over {Poly(n)}}.$$ Thus, we can define Formula$t^{\prime \prime}$ as follows: Formula TeX Source $$\eqalignno{& t^{\prime \prime }=\min \left \{\vphantom {{1}\over {Poly(n)}} t^{\prime }; t^{\prime }\leq\lceil \BBE[\tau({\cal P})\mid \tau ({\cal P})\prec Poly(n)]\rceil +1,\right.\cr& \quad \quad \quad \quad \left. \BBP (\tau ({\cal P})=t^{\prime })\succ {{1}\over {Poly(n)}}\right \}.&{\hbox{(7)}}}$$ Since Formula$\BBP (\tau ({\cal P})=t^{\prime \prime})\succ {{1}\over {Poly(n)}}$, we have Formula TeX Source $$1-\left(1-P_{t^{\prime \prime}}^{\prime }\right)^{N}\succ {{1}\over {Poly(n)}}.\eqno{\hbox{(8)}}$$ Let us assume that Formula$P_{t^{\prime \prime}}^{\ast }\prec {{1}\over {SuperPoly(n)}}$. Here we let Formula${\cal E}$ represent the event “the global optimum is generated in one sampling at the Formula$t^{\prime \prime}$-th generation,” then according to the definitions of Formula$P_{t^{\prime \prime}}^{\ast }$ and Formula$P_{t^{\prime \prime}}^{\prime}$ mentioned in Section II-B, we obtain the following inequality: Formula TeX Source $$\eqalignno{& P_{t^{\prime \prime}}^{\ast }= \BBP({\cal E})\geq \BBP({\cal E},\tau ({\cal P})\geq t^{\prime \prime }) \cr=&\, \BBP({\cal E}\mid \tau ({\cal P})\geq t^{\prime \prime }) \BBP(\tau ({\cal P})\geq t^{\prime \prime }) \cr=&\, P_{t^{\prime \prime}}^{\prime } \BBP(\tau ({\cal P})\geq t^{\prime \prime }).& {\hbox{(9)}}}$$ Meanwhile, (7) implies that Formula TeX Source $$\BBP(\tau({\cal P})\geq t^{\prime \prime })\geq\BBP(\tau({\cal P})= t^{\prime \prime })\succ {{1}\over {Poly(n)}}.\eqno{\hbox{(10)}}$$ Combining (9) and (10) together, we know that Formula$P_{t^{\prime \prime}}^{\ast }\prec {{1}\over {SuperPoly(n)}}$ yields Formula$P_{t^{\prime \prime}}^{\prime}\prec {{1}\over {SuperPoly(n)}}$.

Now Formula$\forall f(n)\prec Poly(n)$, we estimate Formula TeX Source $$\lim _{n\to \infty } {{1-\left(1-{P_{t^{\prime \prime}}^{\prime}}\right)^{N}}\over {1/f(n)}}\eqno{\hbox{(11)}}$$ where Formula$N=N(n)\prec Poly(n)$ is the population size of the EDA. Equation (11) can be calculated as follows: Formula TeX Source $$\eqalignno{& \lim _{n\to \infty } {{1-\left(1-{P_{t^{\prime\prime}}^{\prime}}\right)^{N(n)}}\over {1/f(n)}}\cr& =\lim_{n\to \infty } {{1-\left (\left(1-{P_{t^{\prime \prime}}^{\prime}}\right)^{{1}\over {P_{t^{\prime \prime}}^{\prime}}}\right)^{P_{t^{\prime \prime}}^{\prime }N(n)}}\over {1/f(n)}}\cr& =\lim _{n\to \infty }\left(f(n)-f(n)e^{-P_{t^{\prime\prime}}^{\prime }N(n)}\right)\cr& =\lim _{n\to \infty }\left(\vphantom {{\left(P_{t^{\prime \prime}}^{\prime}N(n)\right)^{2}}\over {2}}f(n)-f(n)\left(\vphantom {{\left(P_{t^{\prime \prime}}^{\prime }N(n)\right)^{2}}\over {2}}1-P_{t^{\prime \prime}}^{\prime }N(n)\right.\right.\cr&\quad\quad\quad \quad\left.\left.+ {{\left(P_{t^{\prime \prime}}^{\prime}N(n)\right)^{2}}\over {2}}+o\left(\left(P_{t^{\prime \prime}}^{\prime }N(n)\right)^{2}\right)\right)\right)\cr& =\lim _{n\to\infty }f(n)P_{t^{\prime \prime}}^{\prime }N(n)-\lim _{n\to\infty } {{f(n)\left(P_{t^{\prime \prime}}^{\prime}N(n)\right)^{2}}\over{2}}\cr& \quad -\lim _{n\to \infty }o\left(f(n)\left(P_{t^{\prime \prime}}^{\prime }N(n)\right)^{2}\right)\cr& \prec \lim _{n\to \infty }{{Poly^{2}(n)}\over {SuperPoly(n)}}-\lim _{n\to \infty }{{Poly^{3}(n)}\over {SuperPoly^{2}(n)}}\cr& \quad -\lim _{n\to \infty }o\left ({{Poly^{3}(n)}\over {SuperPoly^{2}(n)}}\right)=0.}$$ Hence, we know that Formula$1-\left(1-P_{t^{\prime \prime}}^{\prime}\right)^{N}$ is smaller than Formula${{1}\over {f(n)}}\succ {{1}\over {Poly(n)}}$ when Formula$n\to \infty$. In other words Formula TeX Source $$1-\left(1-P_{t^{\prime \prime}}^{\prime }\right)^{N}\prec {{1}\over {SuperPoly(n)}}$$ where we obtain a contradiction to (8).

So we have Formula TeX Source $$P_{t^{\prime \prime}}^{\ast }\succ {{1}\over {Poly(n)}}.$$ The theorem is proven. ■

The theorem above provides us with two simple probability conditions related to the problem classification in terms of EDA-hardness. Later, we will use this theorem to obtain more specific results related to EDA-hardness for the UMDA.

SECTION III

TIME COMPLEXITY ANALYSIS OF EDAS WITH FINITE POPULATION SIZES

A. A General Approach to Analyzing EDAs With Finite Population Sizes

In the domain of EA, several different approaches have been proposed for analyzing theoretically the FHT, such as drift analysis [16], [18], analytical Markov chain [17], Chernoff bounds [7], [23], [24], and convergence rate [15], [43]. Some of them have been applied to EDAs as well. González used the analytical Markov chain to study the worst case exponential FHT of some EDAs [13]. Droste employs drift analysis and Chernoff bounds to analyze the time complexity of cGA (with a population size of two) on linear pseudo-boolean functions [8]. However, those existing techniques might not be sufficient for time complexity analysis of EDAs, because EDAs do not use any variation operators (e.g., mutation and crossover) but rely on sampling successive probabilistic models. Hence, some new ideas are needed to deal with probabilistic models.

One of the main difficulties of analyzing probabilistic models is due to the errors brought by the random sampling processes. Such random errors may occur when a probabilistic model is updated via random sampling. An intuitive idea of handling the random errors is to assume infinite population sizes for EDAs. This assumption has been adopted in the most existing literature, such as the well-known example of ONEMax given by Mühlenbein and Schlierkamp-Voosen [31], and Zhang's convergence analysis of EDAs [45]. Two exceptions are the aforementioned Droste's results on cGA [8] and González's general worst case analysis of EDAs [13].

In this section, we will provide a general approach to analyzing theoretically EDAs with finite population sizes. The approach is closely related to Chernoff bounds and the discrete dynamic system model of population-based incremental learning (PBIL) [1]. PBIL is a more general version of UMDA and its discrete dynamic system model was first presented by González et al. [11], [12], [13]. Assume there is a function Formula${\cal G}\colon \BBR ^{n} \to \BBR ^{n}$, then Formula$A(t+1)= {\cal G}(A(t)) (t=0,1,\ldots)$ is called a discrete dynamic system [39]. In [11], [12], [13], two discrete dynamic system were discussed. The first one considered PBIL as a function Formula${\cal G}_{1}\colon [{0,1}]^{n} \to [{0,1}]^{n}$. Formula${\cal G}_{1}$ includes the random effects. Hence, even if the initial probability distribution and algorithm parameters of PBIL are fixed, the system is still stochastic. This is an exact model of PBIL, but hard to analyze directly. So the authors considered the second dynamic system with the function Formula${\cal G}_{2}\colon [{0,1}]^{n} \to [{0,1}]^{n}$, which removes the random effects by assuming an infinite population size and thereby becomes deterministic. Although the deviation (caused by the random sampling errors) between the two dynamic systems has been estimated, so as to study the fixed point of the first dynamic system by investigating that of the second system, their method does not relate the deviation to the computation time of PBIL. Hence, it is not applicable to time complexity analysis.

Although González et al. [11], [12], [13] did not analyze the time complexity of EDAs, their mathematical models (using the discrete dynamic systems) can be used to develop a feasible approach to analyzing the time complexity of EDAs. Such an approach can be summarized by two major steps.

  1. Build an easy-to-analyze discrete dynamic system for the EDA. The idea is to de-randomize the EDA and build a deterministic 2 dynamic system.
  2. Analyze the deviations caused by de-randomization. Note that EDAs are stochastic algorithms. Concretely, tail probability techniques, such as Chernoff bounds, can be used to bound the deviations.

In this paper, we will use UMDA as an example of EDAs to illustrate the analysis of EDAs time complexity using the above approach. The analysis will show that our approach provides a feasible way of estimating the random errors brought by finite populations in UMDA, and thus shed some light on analyzing other EDAs with finite populations. However, it should be noted that much work remains to be done to achieve such a goal.

B. Univariate Marginal Distribution Algorithm

The UMDA was originally proposed as a discrete EDA [28], [44]. As one of the earliest and simplest EDAs, UMDA has attracted a lot of research attention. The UMDA studied in this paper adopts binary encoding and one of the most commonly used selection strategies—the truncation selection, which is described below.

Sort the Formula${\rm N}$ individuals in the population by their fitness from high to low. Then select the best Formula${\rm M}$ of them for estimating the probability distribution.

The general procedure of UMDA studied in our paper is shown in Table II, where Formula${\bf x}=(x_{1},x_{2},\ldots,x_{n})\in \{0,1\}^{n}$ represents an individual, Formula$p_{t,i}(1) (p_{t,i}(0))$ is the estimated marginal probability of the Formula$i$th bit of an individual to be 1 (0) at the Formula$t$th generation. We can also define the indicators Formula$\delta (x_{i}\vert 1)$ as follows: Formula TeX Source $$\delta (x_{i}\vert 1) \triangleq \cases{1, \hfill & $x_{i}=1$ \hfill \cr 0, \hfill & $x_{i}=0$.\hfill \cr }$$

Table 2
TABLE II UNIVARIATE MARGINAL DISTRIBUTION ALGORITHM (UMDA) WITH TRUNCATION SELECTION

The marginal probabilities Formula$p_{t,i}(1)$ and Formula$p_{t,i}(0)$ are given by Formula TeX Source $$p_{t,i}(1) \triangleq {{\sum\limits_{{\bf x}\in\xi _{t}^{(s)}}\delta (x_{i}\vert 1)}\over {M}}, \quad p_{t,i}(0) \triangleq 1-p_{t,i}(1).$$ Let Formula TeX Source $${\bf P}_{t}({\bf x}) \triangleq \left(p_{t,1}(x_{1}),p_{t,2}(x_{2}),\ldots,p_{t,n}(x_{n})\right)$$ where Formula${\bf P}_{t}({\bf x})$ is a probability vector, which is made up of Formula$n$ random variables (that is because, UMDA is a stochastic algorithm). Then the probability of generating individual Formula${\bf x}$ in the Formula$t$th generation is Formula TeX Source $$p_{t}({\bf x})=\prod _{i=1}^{n} p_{t,i}(x_{i}).$$

C. Analyzing Time Complexity of UMDA

The UMDA given in the former section can be analyzed following the general idea presented in Section III-A. First, we define a function Formula$\gamma \colon [{0,1}]^{n} \to [{0,1}]^{n}$ such that Formula$\gamma = {\cal S}\circ {\cal D}$, where Formula${\cal S}\colon [{0,1}]^{n} \to [{0,1}]^{n}$ is the function that represents the effect of selection, and Formula${\cal D}\colon [{0,1}]^{n} \to [{0,1}]^{n}$ is the function that is used in eliminating the stochastic effects of the random sampling. Then we obtain a deterministic discrete dynamic system Formula$\left\{{\mathhat {{\bf P}}}_{t}({\bf x}^{\ast });t=0,1,\ldots \right\}$ related to the marginal probabilities of generating the global optimum Formula TeX Source $$\eqalignno{{\mathhat {{\bf P}}}_{0}({\bf x}^{\ast })=&\, {\bf P}_{0}({\bf x}^{\ast })&{\hbox{(12)}}\cr{\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast })=&\,\gamma \left({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })\right)= {\cal S}\left({\cal D}\left({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })\right)\right)&{\hbox{(13)}}\cr {\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=&\,\gamma ^{t}\left({\mathhat{{\bf P}}}_{0}({\bf x}^{\ast })\right)&{\hbox{(14)}}}$$ where Formula${\mathhat {{\bf P}}}_{t}({\bf x})=\left({\mathhat {p}}_{t,1}(x_{1}),\ldots, {\mathhat {p}}_{t,n}(x_{n})\right)$ is the marginal probability vector of the deterministic system for generating an individual Formula${\bf x}$, and Formula${\bf x}^{\ast }$ is the global optimum. Since UMDA is usually initialized with a uniform distribution, we consider Formula${\mathhat {{\bf P}}}_{0}({\bf x})= {\bf P}_{0}({\bf x})=\left({{1}\over {2}},\ldots, {{1}\over {2}}\right)$ in this paper. Correspondingly, the probability of generating an individual Formula${\bf x}$ is Formula TeX Source $${\mathhat {p}}_{t}({\bf x})=\prod _{i=1}^{n} {\mathhat {p}}_{t,i}(x_{i}).$$ Note that Formula$p_{t}({\bf x})$ in the former section corresponds to the original UMDA, while Formula${\mathhat {p}}_{t}({\bf x})$ is obtained from the deterministic dynamic system after de-randomization. Following the first step of our general approach, we need to estimate the time complexity of the de-randomized UMDA.

To relate the time complexity result obtained by the deterministic system to the original UMDA, we should estimate the deviation of the de-randomized UMDA from the original UMDA. Since time complexity of the former totally depends on Formula$\left\{{\mathhat {{\bf P}}}_{t}({\bf x}^{\ast });t=0,1,\ldots \right\}$, such deviation arises from the difference between Formula$\left\{{\mathhat {{\bf P}}}_{t}({\bf x}^{\ast });t=0,1,\ldots \right\}$ and Formula$\left\{{\bf P}_{t}({\bf x}^{\ast });t=0,1,\ldots \right\}$. Ideally, we want to exactly calculate the difference between the two sequences of marginal probability vectors. However, this is a non-trivial work (if not impossible). Alternatively, we resort to estimating the probabilities that the deviations are smaller than some specific values. Two crucial lemmas for this task are given below.

Lemma 3 ([26]): Chernoff Bounds

Let Formula$X_{1},X_{2},\ldots,X_{k} \in \{0,1\}$ be Formula$k$ independent random variables (take the value of either 0 or 1) with a same distribution Formula TeX Source $$\forall i\ne j\colon \BBP(X_{i}=1)= \BBP (X_{j}=1)$$ where Formula$i,j\in \{1,\ldots,k\}$. Let Formula$X$ be the sum of those random variables, i.e., Formula$X=\sum _{i=1}^{k} X_{i}$, then we have:

  1. Formula$\forall 0< \delta < 1$ Formula TeX Source $$\BBP\left(X< (1-\delta) \BBE [X]\right)< e^{- \BBE [X]\delta^{2}/2};$$
  2. Formula$\forall \delta \leq 2e-1$ Formula TeX Source $$\BBP\left(X>(1+\delta) \BBE [X]\right)< e^{- \BBE [X]\delta^{2}/4}.$$

Lemma 4 ([21], [38])

Consider sampling without replacement from a finite population Formula$(X_{1},\ldots,X_{N})\in \{0,1\}^{N}$. Let Formula$(Y_{1},\ldots,Y_{M})\in \{0,1\}^{M}$ be a sample of size Formula$M$ get randomly without replacement from the whole population, Formula$Y^{(M)}$ and Formula$X^{(N)}$ be the sums of the random variables in the sample and population, respectively, i.e., Formula$Y^{(M)}=\sum _{i=1}^{M} Y_{i}$ and Formula$X^{(N)}= \sum _{i=1}^{N} X_{i}$, then we have Formula TeX Source $$\eqalignno{\BBP\left(Y^{(M)}- {{MX^{(N)}}\over {N}}\geq M\delta \right)\leq &\, e^{- {{2M\delta ^{2}}\over {1-(M-1)/N}}}\cr <&\, e^{-2M\delta ^{2}}\cr \BBP\left(\left\vert Y^{(M)}- {{MX^{(N)}}\over {N}}\right\vert > M\delta\right)\leq&\, 2e^{- {{2M\delta ^{2}}\over {1-(M-1)/N}}}\cr <&\, 2e^{-2M\delta^{2}}}$$ where Formula$\delta \in [{0,1}]$ is some constant. 3

Another issue that will be involved in our further analysis is to estimate the probability of the following events: Formula TeX Source $$\forall t\in \BBN_{0}\colon p_{t}({\bf x}^{\ast })\oplus {\mathhat {p}}_{t}({\bf x}^{\ast })\eqno{\hbox{(15)}}$$ where Formula$\oplus \in \{\leq,\geq \}$. As we will show soon, they can be handled on the basis of estimation of the probabilities of deviations. Finally, before presenting the case studies in detail, it should be noted that we always consider finite population sizes throughout this paper. Although we will sometimes utilize a statement like “when the problem size becomes sufficiently large,” that does not mean that we assume infinite population sizes, it is merely used to obtain the asymptotic order of a function of the problem size Formula$n$. The main difference is that the infinite population assumption implies infinite population sizes for all problem sizes (so that the random sampling errors are removed), while in our case the population size will be infinite only if the problem size has become infinite.

SECTION IV

WORST CASE ANALYSIS OF UMDA ON THE LEADINGONES PROBLEM

The first maximization problem we investigate is called the LEADINGOnes problem, formally defined as follows: Formula TeX Source $${\rm L{\scriptstyle EADING}O{\scriptstyle NES}}({\bf x}) \triangleq \sum _{i=1}^{n}\prod _{j=1}^{i}x_{j},\quad x_{j}\in \{0,1\}.\eqno{\hbox{(16)}}$$

The global optimum of LEADINGOnes is Formula${\bf x}^{\ast }=(1,\ldots,1)$. The fitness of an individual is determined by the number of the leading 1-bits in the individual, and it is not influenced by any bits right to the leftmost 0-bit of the individual. The value of the bits right to the leftmost 0-bit will not influence the output of fitness-based selection operators in EAs. Due to this characteristic, a population will begin to converge to 1 at a bit if the bits left to it have almost converged to 1's, and thus a sequential convergence phenomenon, namely Domino convergence [3], [36], [41], will happen.

In the literature of EDAs, the LEADINGOnes problem has been investigated empirically [10], but no rigorous theoretical result exists. This section will provide the first theoretical result that put a sound foundation to the time complexity analysis of the UMDA on this problem.

First, we introduce the following concept.

Definition 1 (Formula$b$-Promising Individual)

In the population that contains Formula$N$ individuals, the Formula$b$-promising individuals are those individuals with fitness no smaller than a threshold Formula$b$.

Since the UMDA adopts the truncation selection, we have the following lemma.

Lemma 5

For the UMDA with truncation selection, the poportion of the Formula$b$-promising individuals after selection at the Formula$t$th generation satisfies Formula TeX Source $$Q_{t,b}^{(s)}=\cases{{{Q_{t,b}N}\over {M}}, \hfill & $Q_{t,b} \leq {{M}\over {N}}$ \hfill \cr\noalign{\vskip 6pt} \quad 1, \hfill & $Q_{t,b} > {{M}\over {N}}$\hfill \cr }\eqno {\hbox{(17)}}$$ where Formula$Q_{t,b}\leq 1$ is the proportion of the Formula$b$-promising individuals before the truncation selection.

Define the Formula$i$-convergence time Formula$T_{i}$ to be the number of generations for a discrete EDA to converge to the globally optimal value on the Formula$i$th bit of the solution. It is defined formally as Formula TeX Source $$T_{i} \triangleq \min\left\{t;p_{t,i}\left(x^{\ast }_{i}\right)=1\right\}.$$ Let Formula$T_{0}=0$.

Moreover, in the following parts of the paper, we use the notation “Formula$\omega$” to demonstrate the relationship between the asymptotic orders of two functions [5], [24]. Given two positive functions of the problem size Formula$n$, say Formula$f=f(n)$ and Formula$g=g(n)$, Formula$f=\omega (g)$ holds if and only if Formula$\lim _{n\to \infty }g(n)/f(n)=0$. Now we reach the following theorem.

Theorem 2

Given the population sizes Formula$N=\omega (n^{2+\alpha }\log n)$, Formula$M=\omega (n^{2+\alpha }\log n)$ (where Formula$\alpha$ can be any positive constant) and Formula$M=\beta N$ (Formula$\beta \in (0,1)$ is some constant), for the UMDA with truncation selection on the LEADINGOnes problem, initialized with a uniform distribution, at least with the probability of Formula TeX Source $$\left(1- n^{-\omega (n^{2+\alpha })\delta ^{2}}\right)^{\bar {\tau}}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2(n-1)\bar{\tau}}$$ its FHT satisfies Formula TeX Source $$\tau < \bar {\tau }= {{n\left(\ln{{eM}\over {N}}-\ln (1-\delta)\right)}\over {\ln (1-\delta)+\ln \left({{N}\over{M}}\right)}}+2n$$ where Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1- {{M}\over {N}}\right)$ is a positive constant, and Formula$\bar {\tau }$ represents an upper bound 4 of the random variable Formula$\tau$. In other words, the LEADINGOnes problem is EDA-easy for the UMDA.

Proof

The basic idea of the proof is based on the approach outlined in the former section. We first de-randomize the UMDA. Since the LEADINGOnes problem is associated with the domino convergence property, we can further divide the optimization process into Formula$n$ stages. The Formula$i$th stage starts when all bits at the left side of the Formula$i$th bit have converged to 1's, and ends when the Formula$i$th bit has converged. Suppose generation Formula$t+1$ belongs to the Formula$i$th stage, then the marginal probabilities at the generation are Formula TeX Source $$\eqalignno{& {\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast})=\gamma _{i}\left({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast})\right)=\biggl({\mathhat {p}}_{t,1}\left(x_{1}^{\ast}\right),\ldots, {\mathhat {p}}_{t,i-1}\left(x_{i-1}^{\ast}\right),\cr& \quad \left[G {\mathhat {p}}_{t,i}\left(x_{i}^{\ast}\right)\right],R {\mathhat {p}}_{t,i+1}\left(x_{i+1}^{\ast}\right),\ldots,R {\mathhat{p}}_{t,n}\left(x_{n}^{\ast}\right)\biggr)}$$ where Formula${\bf x}^{\ast }=\left(x_{1}^{\ast },\ldots,x_{n}^{\ast }\right)=(1,\ldots,1)$ is the global optimum of the LEADINGOnes problem, Formula$G=(1-\delta) {{N}\over {M}}$ (Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1- {{M}\over {N}}\right)$ is a constant), and Formula$R=(1-\eta)(1-\eta ^{\prime})$ (Formula$\eta < 1$ and Formula$\eta ^{\prime}< 1$ are positive functions of the problem size Formula$n$). We consider three different cases in the above equation.

  1. Formula$j\in \{1,\ldots,i-1\}$. In the deterministic system above, the marginal probabilities Formula${\mathhat {p}}_{t,j}(x_{j}^{\ast })$ have converged to 1, thus at the next generation they will not change.
  2. Formula$j=i$. In the deterministic system above, the marginal probability Formula${\mathhat {p}}_{t,i}\left(x_{i}^{\ast }\right)$ is converging, and we use the factor Formula$G=(1-\delta) {{N}\over {M}}$ to demonstrate the impact of selection pressure on this converging marginal probability,5 where Formula${{N}\over {M}}$ represents the influence of the selection operator (see Lemma 5).
  3. Formula$j\in \{i+1,\ldots,n\}$. The Formula$j$th bits of individuals are not exposed to selection pressure, and we use the factor Formula$R=(1-\eta)(1-\eta ^{\prime})$ to demonstrate the impact of genetic drift 6 on these marginal probabilities.

In Case 3, we consider the Formula$j$th marginal probability Formula$p_{\cdot,j}\left(x_{j}^{\ast }\right) (j\in \{i+1,\ldots,n\})$ which is not affected by the selection pressure. This is rather pessimistic, because the UMDA tends to preserve the value of Formula$x_{j}^{\ast }=1$ that leads to higher fitness, and thus tends to increase Formula$p_{\cdot,j}\left(x_{j}^{\ast }\right)$. Utilizing the idea mentioned in (15), we will study the time complexity of the UMDA by studying the above deterministic system, and estimate the deviation between the deterministic system and the real UMDA in terms of the probability that the stochastic marginal probabilities of the UMDA are bounded by the corresponding deterministic marginal probabilities of the deterministic system. Before our analysis, we first provide the formal definition of the deterministic system.

With Formula${\mathhat {{\bf P}}}_{0}({\bf x}^{\ast })=\left({{1}\over {2}},\ldots, {{1}\over {2}}\right)$, we have Formula TeX Source $${\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=\gamma _{i}^{t-T_{i-1}}\left({\mathhat{{\bf P}}}_{T_{i-1}}({\bf x}^{\ast })\right)$$ where Formula$T_{i-1}< t\leq T_{i} (i=1,\ldots,n)$. Since Formula$\{\gamma _{i}\}_{i=1}^{n}$ de-randomizes the whole optimization process, Formula$\{T_{i}\}_{i=1}^{n}$ in the above equation are no longer random variables. For the sake of clarity, we rewrite the above equation as Formula TeX Source $${\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=\gamma _{i}^{t- {\mathhat {T}}_{i-1}}\left({\mathhat {{\bf P}}}_{{\mathhat {T}}_{i-1}}({\bf x}^{\ast })\right)$$ where Formula${\mathhat {T}}_{i-1}< t\leq {\mathhat {T}}_{i} (i=1,\ldots,n)$. As we will show immediately, Formula${\mathhat {T}}_{i} (1\leq i\leq n)$ is an upper bound of the random variable Formula$T_{i}$ with some probability. Since Formula$T_{n}\geq \tau$, our task finally becomes calculating the Formula${\mathhat {T}}_{n}$ and the probability that Formula${\mathhat {T}}_{n}$ holds as an upper bound of Formula$T_{n}$.

Now we present the proof in detail. First, we estimate Formula${\mathhat {T}}_{1}$ and Formula$T_{1}$ for the UMDA, which is the first stage of our analysis. Consider the 1-promising individuals. Note that the first bits of the 1-promising individuals are 1's. The sampling procedure of the UMDA can be considered as a large number of events resulting in either 0 or 1. Hence, when Formula$p_{t-1,1}(1)\leq {{M}\over {N(1-\delta)}}$, for the sampling procedure of the UMDA, by noting Lemma 5, we can apply Chernoff bounds to obtain the following: Formula TeX Source $$\eqalignno{& \BBP\left(Mp_{t,1}(1)\geq (1-\delta)p_{t-1,1}(1)N\mid p_{t-1,1}(1) \leq {{M}\over {N(1-\delta)}}\right)\cr& \quad >1- e^{- {{p_{t-1,1}(1)N}\over {2}}\delta ^{2}}}$$ where Formula$N=\omega (n^{2}\log n)$, thus the probability above is super-polynomially close to 1, i.e., an overwhelming probability. An equivalent form of the equation above is Formula TeX Source $$\eqalignno{& \BBP\left(p_{t,1}(1)\geq (1-\delta) {{p_{t-1,1}(1)N}\over {M}}\mid p_{t-1,1}(1) \leq {{M}\over {N(1-\delta)}}\right)\cr& \quad \quad > 1- e^{- {{p_{t-1,1}(1)N}\over {2}}\delta ^{2}}}$$ which demonstrates with an overwhelming probability the marginal probability Formula$p_{t,1}(1)$ is lower bounded by Formula$Gp_{t-1,1}(1)=(1-\delta) {{p_{t-1,1}(1)N}\over {M}}$. Furthermore, given Formula${\mathhat {p}}_{t,1}(1)=G^{t} {\mathhat {p}}_{0,1}(1)$ and Formula$G>1$, we can obtain the inequality in Table III.

Table 3
TABLE III CALCULATION OF PROBABILITY THAT Formula$p_{t,1}(1)$ IS LOWER BOUNDED BY Formula${\mathhat {p}}_{t,1}(1)$

We now study the distribution of Formula$T_{1}$. Considering the probability that Formula$T_{1}$ is bounded by a value, say Formula${\mathhat {T}}_{1}$: given Formula$T_{1}< {\mathhat {T}}_{1}$, then according to Lemma 5, at the Formula$({\mathhat {T}}_{1}-1)$th generation, the marginal probability Formula$p_{{\mathhat {T}}_{1}-1,1}(1)$ should be at least Formula${{M}\over {N(1-\delta)}}$. The above proposition is presented in Table IV, where in (19) the factor Formula$\left(1-e^{- {{{\mathhat {p}}_{0,1}(1)N}\over {2}}\delta ^{2}}\right)$ is added since we apply Chernoff bounds once at the end of the Formula$({\mathhat {T}}_{1}-1)$th generation and obtain the probability that Formula${\mathhat {p}}_{{\mathhat {T}}_{1},1}(1)=1$, under the condition Formula${\mathhat {p}}_{{\mathhat {T}}_{1}-1,1}(1)\geq {{M}\over {N(1-\delta)}}$. Now let us consider the following item. Noting that Formula${\mathhat {p}}_{{\mathhat {T}}_{1}-1,1}(1)$ is deterministic, we know Formula TeX Source $$\BBP\left({\mathhat {p}}_{{\mathhat {T}}_{1}-1,1}(1)> {{M}\over {N(1-\delta)}}\mid p_{0,1}(1)= {\mathhat {p}}_{0,1}(1)\right)\eqno {\hbox{(24)}}$$ must be either 0 or 1, and we need to find the value of Formula${\mathhat {T}}_{1}$ that makes the probability above 1. Given that Formula${\mathhat {p}}_{0,1}(1)= {{1}\over {2}}$, the condition that Formula$\forall t< {\mathhat {T}}_{1}-1\colon {{M}\over {N(1-\delta)}}> {\mathhat {p}}_{t,1}(1)>(1-\delta) {{{\mathhat {p}}_{t-1,1}(1)N}\over {M}}$ and Lemma 5 together imply the following inequalities. Formula TeX Source $$\eqalignno{G^{{\mathhat {T}}_{1}-2} {\mathhat{p}}_{0,1}(1)=&\, (1-\delta)^{{\mathhat{T}}_{1}-2}\left(\displaystyle {{N}\over {M}}\right)^{{\mathhat {T}}_{1}-2} {\mathhat {p}}_{0,1}(1)\cr <&\, \displaystyle{{M}\over {N(1-\delta)}}\cr G^{{\mathhat {T}}_{1}-1}{\mathhat {p}}_{0,1}(1)=&\, (1-\delta)^{{\mathhat{T}}_{1}-1}\left(\displaystyle {{N}\over {M}}\right)^{{\mathhat {T}}_{1}-1} {\mathhat {p}}_{0,1}(1)\cr \geq&\,\displaystyle {{M}\over {N(1-\delta)}}.}$$

Table 4
TABLE IV CALCULATION OF PROBABILITY THAT Formula$T_{1}$ IS UPPER BOUNDED BY Formula${\mathhat {T}}_{1}$

Solving the inequalities above, we get Formula TeX Source $${\mathhat {T}}_{1}\leq {{\ln {{2M}\over {N}}-\ln(1-\delta)}\over {\ln (1-\delta)+\ln \left({{N}\over {M}}\right)}}+2$$ where Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1- {{M}\over {N}}\right)$ is a constant, and it is easy to show that Formula${\mathhat {T}}_{1}=\Theta (1)$. On the other hand, recall the inequalities in Table III, we can continue to estimate the corresponding probability mentioned in (18) Formula TeX Source $$\eqalignno{& \quad \quad \BBP\left(T_{1}\leq {\mathhat {T}}_{1} \mid p_{0,1}(1)= {\mathhat {p}}_{0,1}(1)\right) \cr& > \BBP\left(p_{{\mathhat {T}}_{1}-1,1}(1)\geq {\mathhat {p}}_{{\mathhat {T}}_{1}-1,1}(1)\mid p_{0,1}(1)= {\mathhat {p}}_{0,1}(1)\right) \cr& \cdot \left(1-e^{- {{{\mathhat {p}}_{0,1}(1)N}\over {2}}\delta ^{2}}\right) \cr& >\left(1-e^{- {{{\mathhat {p}}_{0,1}(1)N}\over {2}}\delta ^{2}}\right)^{{\mathhat {T}}_{1}}.& {\hbox{(25)}}}$$ The analysis above tells us, the probability to which the marginal probability converges before the Formula${\mathhat {T}}_{1}$th generation Formula$(T_{1}< {\mathhat {T}}_{1})$ is at least Formula$\left(1- e^{- {{N}\over {4}}\delta ^{2}}\right)^{{\mathhat {T}}_{1}}$. Since Formula$N=\omega (n^{2+\alpha }\log n)$, Formula$M=\beta N$ (Formula$\beta \in (0,1)$ is a constant) and Formula${\mathhat {T}}_{1}$ is polynomial in the problem size Formula$n$, we know that the probability is overwhelming.

At every stage, the bits on the right-hand side of the currently converging bit are not exposed to selection pressure. However, we should still consider the errors brought by the repeated sampling procedures in UMDA, which is related to the genetic drift [6], [41].

Take the first stage as an example. The Formula$j$th bit Formula$(j=2,\ldots,n)$ is affected by genetic drift. First, we utilize Chernoff bounds to study the deviations brought by the random sampling procedures of the UMDA Formula TeX Source $$\eqalignno{& \BBP\left(N_{t,j}\left(x_{j}^{\ast }\right)\geq (1-\eta)p_{t-1,j}\left(x_{j}^{\ast }\right)N\mid p_{t-1,j}\left(x_{j}^{\ast }\right)\right)\cr& \quad \quad \quad\quad\quad \quad\quad > 1- e^{- {{p_{t-1,j}(1)N}\over {2}}\eta ^{2}}}$$ where Formula$\eta$ is a parameter that controls the size of deviation, and Formula$N_{t,j}(x_{j})$ is the number of individuals that takes the value Formula$x_{j}$ in their Formula$j$th bit in the population before selection, Formula$\xi _{t}$. Here we set Formula$\eta =\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}$, and obtain Formula TeX Source $$\eqalignno{& \BBP\left(N_{t,j}\left(x_{j}^{\ast }\right)\geq\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)p_{t-1,j}\left(x_{j}^{\ast }\right)N\mid p_{t-1,j}\left(x_{j}^{\ast }\right)\right)\cr&\quad\quad\quad\quad\quad\quad\quad > 1- e^{- {{p_{t-1,j}\left(x_{j}^{\ast }\right)\omega (\log n)}\over {2}}}= 1- n^{- {{p_{t-1,j}\left(x_{j}^{\ast }\right)\omega (1)}\over {2}}}.}$$

Second, we further consider the selection procedure, since it may also bring some deviations. In our worst case analysis, the Formula$j$th bits of individuals are considered to not be exposed to the selection pressure, then for these bits the selection procedure can be regarded as get a simple random sample of Formula$M$ individuals from a finite population with Formula$N$ individuals [34]. More precisely, since one individual cannot be selected more than once by the truncation selection, this procedure is known as random sampling without replacement from a finite population [34] in the field of statistics. From Lemma 4, we can bound from below the probability such that the number of individuals taking the value Formula$x_{j}^{\ast }$ on their Formula$j$th bits after selection [denoted by Formula$N^{(s)}_{t,j}\left(x_{j}^{\ast }\right)$] is lower bounded, which is shown by the inequalities presented in Table V, where Formula$\eta ^{\prime}$ is a parameter that controls the size of deviation, and Formula$N^{(s)}_{t,j}\left(x_{j}^{\ast }\right)=p_{t,j}\left(x_{j}^{\ast }\right)M$. By setting Formula$\eta '=\eta =\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}$, since Formula$M=\omega (n^{2+\alpha }\log n)$ we obtain Formula TeX Source $$\eqalignno{& \BBP\left(p_{t,j}\left(x_{j}^{\ast }\right)\geq\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}p_{t-1,j}\left(x_{j}^{\ast }\right)\mid p_{t-1,j}\left(x_{j}^{\ast }\right)\right)\cr&\quad\quad\quad\quad\quad > \left(1-n^{-p_{t-1,j}\left(x_{j}^{\ast }\right)\omega (1)}\right)\cr&\quad\quad\quad \quad\quad\cdot \left(1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}p^{2}_{t-1,j}\left(x_{j}^{\ast }\right)\omega (1)}\right)\cr&\quad\quad\quad\quad\quad >\left(1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}p^{2}_{t-1,j}\left(x_{j}^{\ast }\right)\omega(1)}\right)^{2}.}$$

Table 5
TABLE V BOUNDING Formula$N^{(s)}_{t,j}\left(x_{j}^{\ast }\right)$ FROM BELOW WITH AN OVERWHELMING PROBABILITY

Since the factor Formula$R=\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}< 1$, for Formula$\forall j=2\ldots,n$ and Formula$t=1,\ldots, {\mathhat {T}}_{1}$, similar to the analysis shown in Table III, we further obtain Formula TeX Source $$\eqalignno{& \quad \BBP \biggl(p_{t,j}\left(x_{j}^{\ast}\right)\geq\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha}\over {2}}}\right)^{2t}p_{0,j}\left(x_{j}^{\ast }\right) \cr&\quad \quad \quad \quad \quad \quad \quad \mid p_{0,j}\left(x_{j}^{\ast }\right)= {\mathhat{p}}_{0,j}\left(x_{j}^{\ast }\right)\biggr) \cr& > \left(1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over{2}}}\right)^{2} {\mathhat {p}}^{2}_{t-1,j}\left(x_{j}^{\ast}\right)\omega(1)}\right)^{2t}.& {\hbox{(26)}}}$$ Given any Formula$t=O(n)$, according to the definition of the deterministic system, we know Formula TeX Source $${\mathhat {p}}_{t,j}\left(x_{j}^{\ast }\right)\geq \left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over{2}}}\right)^{O(n)} {\mathhat {p}}_{0,j}\left(x_{j}^{\ast }\right)>{{1}\over {e}}$$ holds. The above inequality implies that within the number of generations Formula$t=O(n)$, the probability in (26) is an overwhelming one.

To generalize the above analysis to other stages, let us consider the Formula$i$th Formula$(i\in \{2,\ldots,n\})$ stage is about to start. Due to the genetic drift, the marginal probability Formula$p_{t,j}\left(x_{j}^{\ast }\right) (j\in \{i,\ldots,n\})$ has dropped to a lower level than the initial value Formula${{1}\over {2}}$ by multiplying the factor Formula$R^{t}$. We concern the value of Formula$p_{t,i}\left(x_{i}^{\ast }\right)$. For any Formula$t=O(n)$, similar to (26), the probability that Formula$p_{t,i}\left(x_{i}^{\ast }\right)$ maintains a level of Formula TeX Source $$p_{t,i}\left(x_{i}^{\ast }\right)\geq \left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over{2}}}\right)^{O(n)} {\mathhat {p}}_{0,i}\left(x_{i}^{\ast }\right)>{{1}\over {e}} \eqno {\hbox{(27)}}$$ is super-polynomially close to 1 (an overwhelming probability).

According to (27), we know that Formula$p_{t,i}\left(x_{i}^{\ast }\right)$ is above Formula${{1}\over {e}}$ with an overwhelming probability. Consequently, the joint probability that the first bit has converged to 1 and the genetic drift cannot reduce Formula$p_{{\mathhat {T}}_{1},2}(1)$ to be smaller than Formula${{1}\over {e}}$ by the end of the first stage is Formula TeX Source $$\left(1- e^{- {{\omega (n^{2+\alpha }\log n)}\over {2e}}\delta ^{2}}\right)^{{\mathhat {T}}_{1}}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{T}}_{1}} \eqno {\hbox{(28)}}$$ which is again an overwhelming probability. Now we have finished the analysis of the first stage.

As the dynamic system we described at the beginning of the proof, in the second stage, for Formula${\mathhat {T}}_{1} < t\leq {\mathhat {T}}_{2}$, we have Formula TeX Source $${\mathhat {p}}_{t,2}(1)=G {\mathhat {p}}_{t-{1},2}(1).$$ Given Formula${\mathhat {T}}_{1}$ and the corresponding marginal probabilities, we consider the joint probability that Formula$T_{2}$ is bounded above by Formula${\mathhat {T}}_{2}$ by inequalities presented in Table VI.

Table 6
TABLE VI CALCULATION OF THE JOINT PROBABILITY THAT Formula$T_{1}$ IS BOUNDED ABOVE BY Formula${\mathhat {T}}_{2}$

Let us consider the following item of the probability estimated in Table VI: Formula TeX Source $$\eqalignno{& \BBP\biggl({\mathhat {p}}_{{\mathhat{T}}_{2}-1,2}(1)> {{M}\over {N(1-\delta)}}\mid p_{{\mathhat{T}}_{1},1}(1)=1,\cr& \quad \quad \quad \quad \quad \quad \quad\quad \quad \quad p_{{\mathhat {T}}_{1},2}(1)\geq {\mathhat{p}}_{{\mathhat {T}}_{1},2}(1)> {{1}\over {e}}\biggr)}$$ since Formula$\{{\mathhat {p}}_{t,2}(1)\}_{t=0}^{\infty }$ is a deterministic sequence, the above item must be either 0 or 1. Noting that Formula${\mathhat {p}}_{{\mathhat {T}}_{1},2}(1)> {{1}\over {e}}$, given the condition that Formula$\forall t\colon {\mathhat {T}}_{1}< t< {\mathhat {T}}_{2}-1\colon {{M}\over {N(1-\delta)}}> {\mathhat {p}}_{t,2}(1)=(1-\delta) {{{\mathhat {p}}_{t-1,2}(1)N}\over {M}}$, we can solve the following inequalities to obtain Formula${\mathhat {T}}_{2}$ Formula TeX Source $$\eqalignno{& G^{{\mathhat {T}}_{2}- {\mathhat {T}}_{1}-2} {\mathhat {p}}_{{\mathhat {T}}_{1},2}(1)\cr& \quad =\left ((1-\delta)\left({{N}\over {M}}\right)\right)^{{\mathhat {T}}_{2}- {\mathhat{T}}_{1}-2} {\mathhat {p}}_{{\mathhat {T}}_{1},2}(1)< {{M}\over {N(1-\delta)}}\cr& G^{{\mathhat {T}}_{2}- {\mathhat {T}}_{1}-1} {\mathhat {p}}_{{\mathhat {T}}_{1},2}(1)\cr& \quad =\left ((1-\delta)\left({{N}\over {M}}\right)\right)^{{\mathhat {T}}_{2}- {\mathhat{T}}_{1}-1} {\mathhat {p}}_{{\mathhat {T}}_{1},2}(1)\geq {{M}\over {N(1-\delta)}}.}$$ Moreover, another item in (22) Formula TeX Source $$\eqalignno{& \BBP\Biggl(p_{{\mathhat {T}}_{2}-1,2}(1)\geq{\mathhat {p}}_{{\mathhat {T}}_{2}-1,2}(1)\mid p_{{\mathhat {T}}_{1},1}(1)=1,\cr& \quad p_{{\mathhat {T}}_{1},2}(1)\geq {\mathhat {p}}_{{\mathhat {T}}_{1},2}(1)> {{1}\over{e}},{\mathhat {p}}_{{\mathhat {T}}_{2}-1,2}(1)> {{M}\over {N(1-\delta)}}\Biggr)}$$ should be estimated. This can be done similarly as we have done in Table III. Then we obtain that Formula TeX Source $$T_{2}< {\mathhat {T}}_{2}\leq {{2\ln {{eM}\over {N}}-2\ln(1-\delta)}\over {\ln (1-\delta)+\ln \left({{N}\over {M}}\right)}}+4$$ holds with the probability [the product of the items mentioned in (22)] Formula TeX Source $$\left(1- e^{- {{\omega (n^{2+\alpha }\log n)}\over {2e}}\delta ^{2}}\right)^{{\mathhat {T}}_{2}}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{T}}_{1}}.$$ The above analysis can be readily extended to other stages. To be specific, at the Formula$i$th stage, the Formula$i$-promising individuals are taken into account. We have Formula TeX Source $${\mathhat {p}}_{t,i}(1)=G {\mathhat {p}}_{t-1,i}(1).$$

For induction, assume that at the Formula$(i-1)$th stage Formula TeX Source $$\eqalignno{T_{i-1}<&\, {\mathhat {T}}_{i-1}\leq {{(i-1)\ln{{eM}\over {N}}-(i-1)\ln(1-\delta)}\over {\ln (1-\delta)+\ln\left({{N}\over {M}}\right)}} \cr& +2(i-1)& {\hbox{(29)}}}$$ holds with the probability Formula TeX Source $$\eqalignno{& \left(1- e^{- {{\omega (n^{2+\alpha }\log n)}\over {4}}\delta ^{2}}\right)^{{\mathhat {T}}_{i-1}}\cr & \quad \quad\quad\cdot \prod _{k=1}^{i-2}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{T}}_{k}}.}$$

To estimate Formula${\mathhat {T}}_{i}$, we solve the following inequalities: Formula TeX Source $$\eqalignno{& G^{{\mathhat {T}}_{i}- {\mathhat {T}}_{i-1}-2} {\mathhat {p}}_{{\mathhat {T}}_{i-1},i}(1)\cr& \quad =(1-\delta)^{{\mathhat {T}}_{i}- {\mathhat {T}}_{i-1}-2}\left ({{N}\over {M}}\right)^{{\mathhat {T}}_{i}- {\mathhat {T}}_{i-1}-2} {\mathhat {p}}_{{\mathhat {T}}_{i-1},i}(1)\cr& \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad < {{M}\over{N(1-\delta)}}\cr& G^{{\mathhat {T}}_{i}- {\mathhat {T}}_{i-1}-1} {\mathhat {p}}_{{\mathhat {T}}_{i-1},i}(1)\cr& \quad =(1-\delta)^{{\mathhat {T}}_{i}- {\mathhat {T}}_{i-1}-1}\left ({{N}\over {M}}\right)^{{\mathhat {T}}_{i}- {\mathhat {T}}_{i-1}-1} {\mathhat {p}}_{{\mathhat {T}}_{i-1},i}(1)\cr& \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \geq {{M}\over{N(1-\delta)}}}$$ where Formula${\mathhat {p}}_{{\mathhat {T}}_{i-1},i}(1)> {{1}\over {e}}$[similar to (27)], since Formula${\mathhat {T}}_{i-1}=O(n)$ [our assumption for induction in (29) shows that it is Formula$O(n)$]. Similar to the discussion at the second stage, we can get that Formula TeX Source $$T_{i}< {\mathhat {T}}_{i}\leq {{i\ln {{eM}\over {N}}-i\ln(1-\delta)}\over {\ln (1-\delta)+\ln \left({{N}\over {M}}\right)}}+2i$$ holds with the probability Formula TeX Source $$\eqalignno{& \hskip 24pt \left(1- e^{- {{\omega (n^{2+\alpha }\log n)}\over {2e}}\delta ^{2}}\right)^{{\mathhat {T}}_{i}}\cr& \cdot \prod _{k=1}^{i-1}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{T}}_{k}}.}$$

Finally, the FHT Formula$\tau$ is upper bounded by Formula TeX Source $$\tau < {\mathhat {T}}_{n}= {{n\left(\ln{{eM}\over {N}}-\ln (1-\delta)\right)}\over {\ln (1-\delta)+\ln \left({{N}\over{M}}\right)}}+2n$$ with a probability of Formula TeX Source $$\eqalignno{& \quad \quad \left(1- e^{- {{\omega (n^{2+\alpha }\log n)}\over {4}}\delta ^{2}}\right)^{{\mathhat {T}}_{n}}\cr& \cdot \prod _{k=1}^{n-1}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{T}}_{k}}\cr&\qquad >\left(1-n^{-\omega (n^{2+\alpha })\delta ^{2}}\right)^{{\mathhat {T}}_{n}}\cr& \cdot \left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2(n-1) {\mathhat{T}}_{n}}}$$ which is an overwhelming probability. ■

In the proof above, we have proven that a bound holds for the FHT with an overwhelming probability. Furthermore, the proof also shows the convergence of UMDA on LEADINGOnes: the UMDA will converge to the optimum with an overwhelming probability. The convergence property is ensured by using population sizes of Formula$\omega (n^{2+\alpha }\log n)$, and considering all the random sampling errors in the pessimistic way.

SECTION V

BEST CASE ANALYSIS OF UMDA ON THE BVLEADINGONES PROBLEM

The previous section has shown that the LEADINGOnes problem is EDA-easy for the UMDA. In this section, we will study another maximization problem that is unimodal but EDA-hard for the UMDA. The problem, which is called BVLeadingOnes (BVLO for short), can be regarded as the LEADINGOnes problem with one bit's variation. It is defined as follows: Formula TeX Source $${\rm \rm BVLO}({\bf x})=\cases{{\rm LO}({\bf x})+n, \hfill & ${\rm LO}({\bf x})\leq n-1$, $x_{n}=0$\hfill \cr {\rm LO}({\bf x}), \hfill & ${\rm LO}({\bf x})< n-1$, $x_{n}=1$ \hfill \cr 3n, \hfill & ${\rm LO}({\bf x})= n$\hfill \cr }\eqno {\hbox{(30)}}$$ where Formula$\forall i=1,\ldots,n\colon x_{i}\in \{0,1\}$ and LO stands for LEADINGOnes. The BVLeadingOnes is a unimodal function whose global optimum is Formula${\bf x}^{\ast }=\left(x_{1}^{\ast },\ldots,x_{n}^{\ast }\right)=(1,\ldots,1)$. In this section, we will prove that BVLeadingOnes is EDA-hard for the UMDA.

Let us look at (30) again. The Formula$n$th bits of the individuals are exposed to the selection pressure from the very beginning. During the optimization process, an individual whose last bit is 0 always has higher fitness than any individuals with its last bit being 1, unless the first Formula$n-1$ bits of the latter are all 1's. In other words, the Formula$n$th marginal probability Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ starts converging to 1 from the beginning of optimization, where Formula$\bar {x}_{n}^{\ast }=1-x_{n}^{\ast }=0$. Once Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ reaches 1, the UMDA will miss the global optimum forever. Therefore, we need to check whether an individual whose first Formula$n-1$ bits are all 1's can be generated before Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ reaches 1.

We start from analyzing the converging speed of the first Formula$n-1$ bits of individuals, given polynomial population sizes Formula$M=\omega (n^{2+\alpha }\log n)$, Formula$N=\omega (n^{2+\alpha }\log n)$ (where Formula$\alpha$ can be any positive constant), and Formula$M=\beta N$ (Formula$\beta \in (0, 1)$ is some constant) for the UMDA. These bits can be classified into two categories. The first category is exposed to the selection pressure, and the second one is affected by the genetic drift. Unlike the previous section, here we analyze from an optimistic viewpoint: all bits of the first category will converge in one generation, and the genetic drift will promote the marginal probabilities of generating the optimal value on the remaining bits. We first consider the genetic drift of a typical marginal probability, say Formula$p_{\cdot,q}\left(x_{q}^{\ast }\right)$ (the Formula$q$th bits belong to the second category). Using Chernoff bounds to study the deviations brought by the random sampling procedures, we have Formula TeX Source $$\eqalignno{& \BBP\left(N_{t,q}\left(x_{q}^{\ast }\right)\leq (1+\eta)p_{t-1,q}\left(x_{q}^{\ast }\right)N\mid p_{t-1,q}\left(x_{q}^{\ast }\right)\right)\cr& \quad \quad > 1- e^{- {{p_{t-1,q}\left(x_{q}^{\ast }\right)N}\over {4}}\eta ^{2}}}$$ where Formula$\eta$ is a parameter that controls the size of deviation, and Formula$N_{t,q}\left(x_{q}^{\ast }\right)$ is the number of individuals that takes the value Formula$x_{q}^{\ast }$ in their Formula$q$th bit in the population before selection. Set Formula$\eta =\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}$, we obtain Formula TeX Source $$\eqalignno{& \quad \BBP\biggl(N_{t,q}\left(x_{q}^{\ast}\right)\leq\left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha}\over {2}}}\right)p_{t-1,q}\left(x_{q}^{\ast }\right)N\cr& \quad\quad \quad \quad \quad \quad \quad \mid p_{t-1,q}\left(x_{q}^{\ast }\right)\biggr)\cr& > 1- e^{- {{p_{t-1,q}\left(x_{q}^{\ast }\right)\omega (\log n)}\over {4}}}=1-n^{- {{p_{t-1,q}\left(x_{q}^{\ast }\right)\omega (1)}\over {4}}}.}$$

The selection procedure may also bring some deviations. Since the Formula$q$th bits of individuals are not exposed to the selection pressure, then for these bits the selection procedure can be regarded as Simple Random Sampling without replacement. Lemma 4 can be used to estimate the probability that the number of individuals taking the value Formula$x_{q}^{\ast }$ on their Formula$q$th bits after selection [denoted by Formula$N^{(s)}_{t,j}\left(x_{q}^{\ast }\right)$] is bounded from above, which is lower bounded by Formula$1- e^{-2(1+\eta)^{2}p^{2}_{t-1,q}\left(x_{q}^{\ast }\right)\eta ^{\prime 2}M}$ estimated by (23) in Table VII, where Formula$\eta ^{\prime}$ is a parameter that controls the size of deviation, and Formula$N^{(s)}_{t,q}\left(x_{q}^{\ast }\right)=p_{t,q}\left(x_{q}^{\ast }\right)M$. Let Formula$\eta ^{\prime}=\eta =\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}$, since Formula$M=\omega (n^{2+\alpha }\log n)$ we get Formula TeX Source $$\eqalignno{& \BBP\left(p_{t,q}\left(x_{q}^{\ast }\right)\leq\left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}p_{t-1,q}\left(x_{q}^{\ast }\right)\mid p_{t-1,q}\left(x_{q}^{\ast }\right)\right)\cr&\quad \quad \quad \quad \quad > \left(1-n^{-p_{t-1,q}\left(x_{q}^{\ast }\right)\omega (1)}\right)\cr&\quad \quad \qquad\quad \quad \cdot \left(1-n^{-\left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}p^{2}_{t-1,q}\left(x_{q}^{\ast }\right)\omega (1)}\right)\cr&\quad \quad \quad\quad \quad >\left(1-n^{-\left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}p^{2}_{t-1,q}\left(x_{q}^{\ast }\right)\omega(1)}\right)^{2}.}$$

Table 7
TABLE VII BOUNDING Formula$N^{(s)}_{t,q}\left(x_{q}^{\ast }\right)$ FROM ABOVE WITH AN OVERWHELMING PROBABILITY

Since Formula$R=\left (1+\left ({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}>1$ (thus we know that Formula${\mathhat {p}}_{t-1,q}\left(x_{q}^{\ast }\right)> {\mathhat {p}}_{0,q}\left(x_{q}^{\ast }\right)$ in the above inequality), similar to the analysis shown in Table III, we further have Formula TeX Source $$\eqalignno{& \quad \BBP\biggl(p_{t,q}\left(x_{q}^{\ast}\right)\leq\left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha}\over {2}}}\right)^{2t}p_{0,q}\left(x_{q}^{\ast }\right)\cr&\quad \quad \quad \quad \quad \quad \quad \mid p_{0,q}\left(x_{q}^{\ast }\right)= {\mathhat{p}}_{0,q}\left(x_{q}^{\ast }\right)\biggr)\cr& > \left(1-n^{-\left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha }\over{2}}}\right)^{2} {\mathhat {p}}^{2}_{0,q}\left(x_{q}^{\ast}\right)\omega(1)}\right)^{2t}.}$$ Given any polynomial Formula$t$, the above probability is an overwhelming one. Specifically, Formula$\forall t=O(n)$, Formula$p_{t,q}\left(x_{q}^{\ast }\right)$ is upper bounded as Formula TeX Source $$\eqalignno{& \quad p_{t,q}\left(x_{q}^{\ast }\right)\leq \left(1+\left({{1}\over {n}}\right)^{1+ {{\alpha }\over{2}}}\right)^{O(n)} {\mathhat {p}}_{0,q}\left(x_{q}^{\ast }\right) \cr& = {{1}\over {2}}+\Theta \left({{1}\over {n^{\alpha /2}}}\right)+o\left({{1}\over{n^{\alpha /2}}}\right)< c< 1& {\hbox{(31)}}}$$ with an overwhelming probability (where Formula$c$ is some positive constant, and the Formula$q$th bits are not exposed to the selection pressure).

Another key issue of our analysis is the time Formula$T_{n}^{\prime}$ for the Formula$n$th marginal probability Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ to converge to 1. We can prove the following lemma.

Lemma 6

The number of generations required by the marginal probability Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ to converge to 1, i.e., Formula$T_{n}^{\prime}$, is upper bounded by Formula TeX Source $$U= {{\ln{{2M}\over {N}}-\ln (1-\delta)}\over {\ln (1-\delta)+\ln \left({{N}\over {M}}\right)}}+2$$ with an overwhelming probability, if no global optimum is generated before the Formula$U$th generation, where Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1- {{M}\over {N}}\right)$ is a positive constant.

The proof is provided in the Appendix. Given polynomial population sizes Formula$M=\omega (n^{2+\alpha }\log n)$, Formula$N=\omega (n^{2+\alpha }\log n)$ (where Formula$\alpha$ can be any positive constant), and Formula$M=\beta N$ (Formula$\beta \in (0,1)$ is some constant), Lemma 6 implies that Formula$U=\Theta (1)$. Now we reach the following theorem.

Theorem 3

Given polynomial population sizes Formula$M=\omega (n^{2+\alpha }\log n)$, Formula$N=\omega (n^{2+\alpha }\log n)$ (where Formula$\alpha$ can be any positive constant), and Formula$M=\beta N$ (Formula$\beta \in (0,1)$ is some constant), the FHT of the UMDA with truncation selection on the BVLeadingOnes problem is infinity with an overwhelming probability. In other words, the UMDA with truncation selection cannot find the optimum of the BVLeadingOnes problem with an overwhelming probability.

Proof

We have proven that the number of generations required for Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ to reach 1 (denoted by Formula$T_{n}^{\prime}$) is upper bounded by a constant function Formula$U$ with an overwhelming probability, under the condition that no global optimum is generated before the Formula$U$th generation. We now further prove that the probability that no global optimum is generated before the Formula$U$th generation is also overwhelming.

As mentioned before, we classify the first Formula$n-1$ bits of individuals into two categories. The first category, which contains the bits being exposed to the selection, further contains two types of bits. The first type contains the bits which have already converged to the optimal values, and the second type contains the bits that are exposed to the selection pressure but have not converged to the optimal values yet. In our best case analysis, for the bits of the second type, we consider that only one generation is needed for the corresponding marginal probabilities (to the optimal values) to converge. In other words, before the Formula$U$th generation, the marginal probabilities (of the first Formula$n-1$ bits of individuals) are either 1 or no more than the constant Formula$c$. Noting that Formula$U=\Theta (1)$, according to (31), Formula$c\in \left({{1}\over {2}},1\right)$, and it demonstrates the result of genetic drift within Formula$O(n)$ generations. From an optimistic viewpoint, we further consider that in every generation, besides the marginal probability Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$, at most Formula$\log ^{2} n$ other marginal probabilities 7 are also converging with an overwhelming probability. Formula$\log ^{2}n$ is used here because the joint probability of generating Formula$\log ^{2}n$ consecutive 1's (so as to produce the selection pressure on the corresponding bits) by Formula$\log ^{2}n$ non-converged marginal probabilities is no more than Formula$c^{\log ^{2}n}$, which is super-polynomially small.

The above result implies that the probability of generating the global optimum in one generation is also super-polynomially small. Noting that Formula$U=\Theta (1)$, then the probability of generating the optimum before the Formula$U$th generation is also super-polynomially small. Combining this probability with the conditional probability mentioned in Lemma 6, we know that the joint probability that no global optimum is generated before the Formula$U$th generation, and Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ converges to 1 no later than the Formula$U$th generation, is super-polynomially close to 1, i.e., an overwhelming probability. Combining with the fact that once the Formula$n$th marginal probability Formula$p_{\cdot,n}\left(x_{n}^{\ast }\right)$ has already converged to 0, the probability of finding the optimum will drop to 0, we have proven the theorem.

According to Theorem 1, given polynomial population sizes Formula$M=\omega (n^{2+\alpha }\log n)$ and Formula$N=\omega (n^{2+\alpha }\log n)$ (Formula$M=\beta N$, Formula$\beta \in (0,1)$ is a constant.), BVLeadingOnes is EDA-hard for the UMDA. ■

For the sake of consistence, we also provide the formal description of the deterministic dynamic system utilized in this section. Considering the Formula$i$th stage Formula$\left(i\leq \min \left\{T_{n}^{\prime}, {{n-1}\over {\log ^{2}n}}\right\}\right)$ which starts when all the marginal probabilities Formula$p_{\cdot,k}\left(x_{k}^{\ast }\right) (k\leq (i-1)\log ^{2} n\})$ have just converged to 1 and ends when all the marginal probabilities Formula$p_{\cdot,j}\left(x_{j}^{\ast }\right) (j\leq i\log ^{2} n)$ have just converged to 1, we can obtain Formula${\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast })$ by defining Formula$\gamma _{i}$ as follows. Formula TeX Source $$\eqalignno{& \quad \quad {\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast})=\gamma _{i}\left({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast})\right)=\cr& \biggl({\mathhat {p}}_{t,1}\left(x_{1}^{\ast}\right),\ldots, {\mathhat {p}}_{t,(i-1)\log^{2}n}\left(x_{(i-1)\log ^{2} n}^{\ast }\right),1,\ldots,1,\cr&\quad R {\mathhat {p}}_{t,i\log ^{2} n+1}\left(x_{i\log^{2}n+1}^{\ast }\right),\ldots,R {\mathhat{p}}_{t,n-1}\left(x_{n-1}^{\ast }\right),\cr& \quad \quad \quad\quad \quad \quad \quad \quad1-G\left(1- {\mathhat{p}}_{t,n}\left(x_{n}^{\ast }\right)\right)\biggr)}$$ where Formula$R=(1+\eta)(1+\eta ^{\prime})$(Formula$\eta < 1$ and Formula$\eta ^{\prime}< 1$ are positive functions of the problem size Formula$n$), and Formula$G=(1-\delta) {{N}\over {M}}$(Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1- {{M}\over {N}}\right)$ is a constant). In the above equation, we consider four different cases.

  1. Formula$j\in \{1,\ldots,(i-1)\log ^{2} n\}$. In the deterministic system above, the marginal probabilities Formula${\mathhat {p}}_{t,j}\left(x_{j}^{\ast }\right)$ have converged to 1, thus at the next generation they will not change.
  2. Formula$j\in \{(i-1)\log ^{2} n+1,\ldots,i\log ^{2} n\}$. In the deterministic system above, the marginal probabilities Formula${\mathhat {p}}_{t,j}\left(x_{j}^{\ast }\right)$ are converging to the optimum, and they will converge in one generation in the best case analysis.
  3. Formula$j\in \{i\log ^{2} n+1,\ldots,n-1\}$. The Formula$j$th bits of individuals are not exposed to selection pressure, and we use the factor Formula$R=(1+\eta)(1+\eta ^{\prime})$ to demonstrate the impact of genetic drift in the deterministic system above.
  4. Formula$j=n$. The marginal probability Formula${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)=1- {\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right)$ is converging, and we use the factor Formula$G=(1-\delta) {{N}\over {M}}$ to demonstrate the impact of selection pressure on this converging marginal probability in the deterministic system above, which is a best case style for Formula${\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right)$.

With Formula${\mathhat {{\bf P}}}_{0}({\bf x}^{\ast })=\left({{1}\over {2}},\ldots, {{1}\over {2}}\right)$, noting that one stage actually refers to one generation (thus Formula$i=t$), we have Formula TeX Source $${\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=\gamma _{t}\circ \gamma _{t-1}\ldots\circ \gamma _{1}\left({\mathhat {{\bf P}}}_{0}({\bf x}^{\ast })\right)$$ where Formula$t\leq \min \left\{T_{n}^{\prime}, {{n-1}\over {\log ^{2} n}}\right\}$. Since Formula$\{\gamma _{i}\}_{i=1}^{t}$ de-randomizes the whole optimization process, Formula$T_{n}^{\prime}$ in the above equation is no longer random variable. For the sake of clarity, we rewrite the above equation as Formula TeX Source $${\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=\gamma _{t}\circ \gamma _{t-1}\ldots\circ \gamma _{1}\left({\mathhat {{\bf P}}}_{0}({\bf x}^{\ast })\right)$$ where Formula$t\leq \min \left\{{\mathhat {T}}_{n}^{\prime}, {{n-1}\over {\log ^{2} n}}\right\}\leq \min \left\{U, {{n-1}\over {\log ^{2} n}}\right\}$.

SECTION VI

A MODIFIED UMDA: RELAXATION BY MARGINS

So far we have seen both EDA-easy and EDA-hard problems for the UMDA. This section will analyze more in-depth the relationship between EDA-hardness and the algorithms. The BVLeadingOnes problem, which has proven to be EDA-hard for the UMDA with finite populations, will be employed as the target problem in this section. We will show that a simple “relaxed” version of UMDA with truncation selection can solve the BVLeadingOnes problem efficiently. The “relaxation” is implemented by adding some “margins” to the marginal probabilities of the UMDA. That is, the highest level the marginal probabilities can reach is Formula$1- {{1}\over {M}}$ and the lowest level the marginal probabilities can drop to is Formula${{1}\over {M}}$. Any marginal probabilities higher than Formula$1- {{1}\over {M}}$ are set to be Formula$1- {{1}\over {M}}$, and any marginal probabilities lower than Formula${{1}\over {M}}$ are set to be Formula${{1}\over {M}}$. We denote such a UMDA with margin as Formula${\rm UMDA}_{M}$. The margins here aim to avoid the premature convergence, which is similar to the upper and lower bounds of the pheromone information in Max-Min Ant System [40] and Laplace correction [2]. It is noteworthy that we are not trying to propose a new algorithm here. Instead, by an example, we are trying to demonstrate theoretically that some approaches proposed to avoid premature convergence of EDAs, can actually help to promote the performance of the algorithms.

We have seen in the previous section that the original UMDA cannot solve BVLeadingOnes efficiently. Interestingly, by adding the margins, the Formula${\rm UMDA}_{M}$ can solve BVLeadingOnes efficiently. The following theorem summarizes the main result.

Theorem 4

Given polynomial population sizes Formula$N=\omega (n^{2+\alpha }\log n)$, Formula$M=\omega (n^{2+\alpha }\log n)$ (where Formula$\alpha$ can be any positive constant) and Formula$M=\beta N$ (Formula$\beta \in (0,1)$ is some constant), then for any constant Formula$\delta$ that satisfies Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1-e^{{1}\over {\epsilon (n)}} {{M}\over {N}}\right)$ (where Formula$\epsilon (n)= {{M}\over {n}}$), the first hitting time Formula$\tau$ of the Formula${\rm UMDA}_{M}$ with truncation selection (initialized with a uniform distribution) satisfies Formula TeX Source $$\eqalignno{& \tau < \bar {\tau }= {{\left(\ln{{e(M-1)}\over {N}}-\ln (1-\delta)\right)n\epsilon (n)+n}\over {\epsilon (n)\ln (1-\delta)+\epsilon (n)\ln \left({{N}\over {M}}\right)-1}}\cr& \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + {{M}\over{N}}\ln ^{2}n+2n}$$ with the overwhelming probability Formula TeX Source $$\eqalignno{& \left(1-n^{-e^{-1/\epsilon (n)}\omega (n^{2+\alpha })\delta ^{2}/2e}\right)^{2\bar {\tau}}\cr& \quad \cdot \left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2(n-1)\bar {\tau}}\cr& \quad \cdot \left (1-\left({{1}\over {e}}\right)^{\omega (\ln n)}\right).}$$

Proof

In order to proof the above theorem, we define Formula$n+1$ random variables Formula$t_{0}$ and Formula$t_{i} (i=1,\ldots,n)$ as follows: Formula TeX Source $$\eqalignno{& t_{0} \triangleq \min\left\{t;p_{t,n}\left(\bar {x}^{\ast }_{n}\right)=1- {{1}\over {M}}\right\}\cr& t_{i} \triangleq \min\left\{t;p_{t,i}\left(x^{\ast }_{i}\right)=1- {{1}\over {M}}\right\}.}$$ The proof follows our basic idea introduced in Section III-A, and thus is similar to the proof of Theorem 2. However, the maximal value that a marginal probability can reach drops to Formula$1- {{1}\over {M}}$, and the minimal value that a marginal probability can reach increases to Formula${{1}\over {M}}$. We will then de-randomize the Formula${\rm UMDA}_{M}$.

In the analysis, we ignore the possibility that the optimum is found before the Formula$t_{0}$th generation (which will make the FHT smaller), and we divide the optimization process into Formula$(n+1)$th stages. The 1st stage begins when the optimization begins, and ends when the marginal probability Formula${\mathhat {p}}_{\cdot,n}\left(\bar {x}^{\ast }_{n}\right)$ reaches Formula$1- {{1}\over {M}}$ for the first time. The 2nd stage follows the 1st stage, and ends when the marginal probability Formula${\mathhat {p}}_{\cdot,1}\left(x^{\ast }_{1}\right)$ reaches Formula$1- {{1}\over {M}}$ for the first time. The Formula$q$th stage Formula$(q\in \{2,\ldots,n\})$ begins when the marginal probability Formula${\mathhat {p}}_{\cdot,q-2}\left(x^{\ast }_{q-2}\right)$ reaches Formula$1- {{1}\over {M}}$ for the first time, and ends when the marginal probability Formula${\mathhat {p}}_{\cdot,q-1}\left(x^{\ast }_{q-1}\right)$ reaches Formula$1- {{1}\over {M}}$ for the first time.

Let us consider the deterministic system. Suppose generation Formula$t+1$ belongs to the Formula$i$th stage Formula$(i\in \{1,\ldots,n+1\})$, then the marginal probabilities at this generation are updated from the marginal probabilities at generation Formula$t$ by Formula$\gamma _{i}$. When Formula$i=1$, we have Formula TeX Source $$\eqalignno{& {\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast})=\gamma _{1}\left({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast})\right)=\cr& \biggl(R {\mathhat {p}}_{t,1}\left(x_{1}^{\ast}\right),\ldots,R {\mathhat {p}}_{t,n-1}\left(x_{n-1}^{\ast}\right),\cr& \quad \quad \quad \quad \quad \quad \quad\quad1-G_{1}\left(1- {\mathhat {p}}_{t,n}\left(x_{n}^{\ast}\right)\right)\biggr)}$$ where Formula$R=(1-\eta)(1-\eta ^{\prime})$(Formula$\eta < 1$ and Formula$\eta ^{\prime}< 1$ are positive functions of the problem size Formula$n$), and Formula$G_{1}=(1-\delta) {{N}\over {M}}$(Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1-e^{{1}\over {\epsilon (n)}} {{M}\over {N}}\right)$ is a constant). In the above equation, we consider two different cases.

  1. Formula$j\in \{1,\ldots,n-1\}$. In the deterministic system above, the Formula$j$th bits of individuals are not exposed to selection pressure, and we use the factor Formula$R=(1-\eta)(1-\eta ^{\prime})$ to demonstrate the impact of genetic drift on these marginal probabilities.
  2. Formula$j=n$. In the deterministic system above, the marginal probability Formula${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)=1- {\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right)$ is increasing, and we use the factor Formula$G_{1}=(1-\delta) {{N}\over {M}}$ to demonstrate the impact of selection pressure on the increasing marginal probability Formula${\mathhat {p}}_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$(Formula${\mathhat {p}}_{t+1,n}\left(\bar {x}_{n}^{\ast }\right)=G_{1} {\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)$, thus Formula${\mathhat {p}}_{t+1,n}\left(x_{n}^{\ast }\right)=1-G_{1} {\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)=1-G_{1}(1- {\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right))$ holds).

When Formula$i\in \{2,\ldots,n\}$, we have Formula TeX Source $$\eqalignno{& \quad {\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast})=\gamma _{i}\left({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast})\right)\cr& =\biggl({\mathhat {p}}_{t,1}\left(x_{1}^{\ast}\right),\ldots, {\mathhat {p}}_{t,i-2}\left(x_{i-2}^{\ast}\right),\cr& \quad \quad \quad G_{2} {\mathhat{p}}_{t,i-1}\left(x_{i-1}^{\ast }\right),R {\mathhat{p}}_{t,i}\left(x_{i}^{\ast }\right),\ldots,\cr& \quad \quad\quad \quad \quad \quad R {\mathhat{p}}_{t,n-1}\left(x_{n-1}^{\ast }\right), {\mathhat{p}}_{t,n}\left(x_{n}^{\ast }\right)\biggr)}$$ where Formula$G_{2}=(1-\delta)\left(1- {{1}\over {M}}\right)^{n} {{N}\over {M}}$ (Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1-e^{{1}\over {\epsilon (n)}} {{M}\over {N}}\right)$ is a constant), and Formula$R=(1-\eta)(1-\eta ^{\prime})$ (Formula$\eta < 1$ and Formula$\eta ^{\prime}< 1$ are positive functions of the problem size Formula$n$). In the above equation, we consider four different cases for the deterministic system above.

  1. Formula$j\leq i-2$, Formula$j\in \BBN ^{+}$. The marginal probabilities Formula${\mathhat {p}}_{t,j}\left(x_{j}^{\ast }\right)$ have reached Formula$1- {{1}\over {M}}$, and at the next generation they will not change (we will soon prove this).
  2. Formula$j=i-1$. The marginal probability Formula${\mathhat {p}}_{t,j}\left(x_{j}^{\ast }\right)$ is increasing, and we use the factor Formula$G_{2}=(1-\delta)\left(1- {{1}\over {M}}\right)^{n} {{N}\over {M}}$ to demonstrate the impact of selection pressure on this increasing marginal probability.
  3. Formula$j\in \{i,\ldots,n-1\}$. The Formula$j$th bits of individuals are not exposed to selection pressure, and we use the factor Formula$R=(1-\eta)(1-\eta ^{\prime})$ to demonstrate the impact of genetic drift on these marginal probabilities.
  4. Formula$j=n$ The marginal probabilities Formula${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)$ and Formula${\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right)$ have reached Formula$1- {{1}\over {M}}$ and Formula${{1}\over {M}}$ respectively, and at the next generation they will not change (we will soon prove this).

Consider the Formula$(n+1)$th stage, we have Formula TeX Source $$\eqalignno{& \quad {\mathhat {{\bf P}}}_{t+1}({\bf x}^{\ast })=\gamma _{n+1}({\mathhat {{\bf P}}}_{t}({\bf x}^{\ast }))\cr& =\left({\mathhat {p}}_{t,1}\left(x_{1}^{\ast }\right),\ldots, {\mathhat {p}}_{t,n-1}\left(x_{n-1}^{\ast }\right), {\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right)\right)}$$ where we consider two different cases for this deterministic system. Formula$j\in \{1,\ldots,n-1\}$.

  1. The marginal probabilities Formula${\mathhat {p}}_{t,j}\left(x_{j}^{\ast }\right)$ have reached Formula$1- {{1}\over {M}}$, and at the next generation they will not change (we will soon prove this).
  2. Formula$j=n$. The marginal probability Formula${\mathhat {p}}_{t,n}\left(x_{n}^{\ast }\right)$ is always no smaller than Formula${{1}\over {M}}$.

With Formula${\mathhat {{\bf P}}}_{0}({\bf x}^{\ast })=\left({{1}\over {2}},\ldots, {{1}\over {2}}\right)$, we have Formula TeX Source $${\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=\gamma _{i}^{t-t_{i-2}}\left({\mathhat {{\bf P}}}_{t_{i-2}}({\bf x}^{\ast })\right)$$ where Formula$t_{i-2}< t\leq t_{i-1} (i=1,\ldots,n+1)$, and we let Formula$t_{-1}=0$ represent the beginning of the optimization process. Since Formula$\{\gamma _{i}\}_{i=1}^{n+1}$ de-randomizes the whole optimization process, Formula$\{t_{i}\}_{i=0}^{n}$ in the above equation are no longer random variables. For the sake of clarity, we rewrite the above equation as Formula TeX Source $${\mathhat {{\bf P}}}_{t}({\bf x}^{\ast })=\gamma _{i}^{t- {\mathhat {t}}_{i-2}}\left({\mathhat {{\bf P}}}_{{\mathhat {t}}_{i-2}}({\bf x}^{\ast })\right)$$ where Formula${\mathhat {t}}_{i-2}< t\leq {\mathhat {t}}_{i-1} (i=1,\ldots,n+1)$. As we will show immediately, Formula${\mathhat {t}}_{i} (0\leq i\leq n)$ is an upper bound of the random variable Formula$t_{i}$ with some probability. Once all Formula${\mathhat {t}}_{i}$ can be estimated, and all the marginal probabilities Formula$p_{t,j}\left(x_{j}^{\ast }\right) (j=1,\ldots,n)$ have reached Formula$1- {{1}\over {M}}$, the optimum might already be found, or it will take only a few steps to generate the optimum. Thus, if we can prove that once the marginal probabilities Formula$p_{t,j}\left(x_{j}^{\ast }\right) (j=1,\ldots,n-1)$ have reached Formula$1- {{1}\over {M}}$, it will never reduce again, our task finally becomes calculating the Formula${\mathhat {t}}_{n}$, the probability that Formula${\mathhat {t}}_{n}$ holds as an upper bound of Formula$t_{n}$.

We now provide the formal proof stage by stage. At the 1st stage, we analyze the case with the Formula$n$th bit. At the Formula$t$th generation (which belongs to the 1st stage), according to Lemma 5 and Chernoff bounds, we have Formula TeX Source $$\eqalignno{& \quad \BBP\biggl(p_{t,n}\left(\bar {x}_{n}^{\ast}\right)\geq (1-\delta) {{p_{t-1,n}\left(\bar {x}_{n}^{\ast}\right)N}\over {M}}\cr& \quad \quad \quad \quad \quad \quad\quad \mid p_{t-1,n}\left(\bar {x}_{n}^{\ast }\right) \leq {{M-1}\over{N(1-\delta)}}\biggr)\cr& > 1- e^{-p_{t-1,n}\left(\bar{x}_{n}^{\ast }\right)N\delta ^{2}/2}}$$ where Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1-e^{{1}\over {\epsilon (n)}} {{M}\over {N}}\right)$ is a positive constant, and Formula$p_{t,n}\left(\bar {x}_{n}^{\ast }\right)\leq 1- {{1}\over {M}}$ (since the UMDA adopts margins) yields the condition that Formula$p_{t-1,n}\left(\bar {x}_{n}^{\ast }\right) \leq {{M-1}\over {N(1-\delta)}}$. Similar to Table III in the proof of Theorem 2 we can obtain Formula TeX Source $$\eqalignno{& \BBP\left(p_{t,n}\left(\bar {x}_{n}^{\ast }\right)\geq G_{1}^{t}p_{0,n}\left(\bar {x}_{n}^{\ast }\right)\mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right) \cr& \quad \quad >\left(1-e^{-p_{0,n}\left(\bar {x}_{n}^{\ast }\right)N\delta ^{2}/2}\right)^{t}.& {\hbox{(36)}}}$$

Consider the probability that Formula$t_{0}$ is upper bounded by some value, say Formula${\mathhat {t}}_{0}$, we obtain the inequalities estimated in Table VIII, where in (33) the factor Formula$\left(1-e^{-{{\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)N}\delta ^{2}/2}\right)$ is added since we apply Chernoff bounds at the end of the Formula$({\mathhat {t}}_{0}-1)$th generation. Now we consider the following item: Formula TeX Source $$\eqalignno{& \BBP\left({\mathhat {p}}_{{\mathhat {t}}_{0}-1,n}\left(\bar {x}_{n}^{\ast }\right)> {{M-1}\over {N(1-\delta)}}\mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right) \cr& = \BBP\left({\mathhat {p}}_{{\mathhat {t}}_{0}-1,n}\left(\bar {x}_{n}^{\ast }\right)> {{M-1}\over {N(1-\delta)}}\right).& {\hbox{(37)}}}$$ Since Formula$\left\{{\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)\right\}_{t=0}^{\infty }$ is a deterministic sequence, the probability above must be either 0 or 1. We need to find the value of Formula${\mathhat {t}}_{0}$ that makes the above probability 1. Given that Formula${\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {{1}\over {2}}$, the definition of Formula${\mathhat {t}}_{0}$ (it is an upper bound of Formula$t_{0}$ defined at the beginning of the proof) and the condition that Formula$\forall t< {\mathhat {t}}_{0}-1\colon {{M-1}\over {N(1-\delta)}}> {\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)>(1-\delta) {{{\mathhat {p}}_{t-1,n}\left(\bar {x}_{n}^{\ast }\right)N}\over {M}}$ together imply Formula TeX Source $$\eqalignno{&\quad \quad \quad G_{1}^{{\mathhat {t}}_{0}-2} {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\cr& \quad =\left ((1-\delta)\left({{N}\over {M}}\right)\right)^{{\mathhat {t}}_{0}-2} {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)< {{M-1}\over {N(1-\delta)}}\cr& G_{1}^{{\mathhat {t}}_{0}-1} {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\cr& \quad =\left ((1-\delta)\left({{N}\over {M}}\right)\right)^{{\mathhat {t}}_{0}-1} {\mathhat{p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\geq {{M-1}\over {N(1-\delta)}}.}$$ Hence, we obtain the value of Formula${\mathhat {t}}_{0}$ Formula TeX Source $${\mathhat {t}}_{0}\leq {{\ln{{2M-2}\over {N}}-\ln (1-\delta)}\over {\ln (1-\delta)+\ln \left({{N}\over {M}}\right)}}+2.$$ Now we can continue to estimate the probability mentioned in (32), which can provide us the probability that Formula$t_{0}$ is upper bounded by Formula${\mathhat {t}}_{0}$. Similar to (25) in the proof of Theorem 2, according to (36), we can obtain that the probability is at least Formula$\left(1-e^{-p_{0,n}\left(\bar {x}_{n}^{\ast }\right)N\delta ^{2}/2}\right)^{{\mathhat {t}}_{0}}$.

Table 8
TABLE VIII CALCULATION OF PROBABILITY THAT Formula$t_{0}$ IS UPPER BOUNDED BY Formula${\mathhat {t}}_{0}$

On the other hand, we can deal with the genetic drift in the same way as we did for Theorem 2: since Formula${\mathhat {t}}_{0}=\Theta (1)$, when Formula$t= {\mathhat {t}}_{0}$, for the marginal probabilities of other bits, a level of Formula${{1}\over {e}}$ can be maintained at least with the overwhelming probability of Formula TeX Source $$\left(1- e^{- {{\omega (n^{2+\alpha }\log n)}\over {2e}}\delta ^{2}}\right)^{{\mathhat {t}}_{0}}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{t}}_{0}}$$ where the second factor Formula$\left(1-n^{-\left (1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat {t}}_{0}}$ comes from the analysis of genetic drift (please refer to (26) for details). The proof details will be very similar to those in the proof of Theorem 2. For the sake of brevity, we omit the details. Now we have finished the analysis of the 1st stage.

After the marginal probability Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ has reached Formula$1- {{1}\over {M}}$, i.e., Formula$t\geq {\mathhat {t}}_{0}$, Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ will not drop to a level that is smaller than Formula$1- {{1}\over {M}}$ again unless the algorithm has found the optimum. In fact, for other marginal probabilities, similar fact also holds. In order to prove it, let us consider the Formula$(i+1)$th stage Formula$(1\leq i< n)$, and we use the factor Formula$G_{2}$ to demonstrate the impact of selection, by which the interactions among bits are taken into account. For the Formula$i$th bit, at the Formula$k$th generation, we can investigate the following situation: Formula TeX Source $$\eqalignno{& p_{k,i}\left(x_{i}^{\ast }\right)< 1- {{1}\over {M}},\cr& \forall j\leq i-1\colon p_{k,j}\left(x_{j}^{\ast }\right)=1- {{1}\over {M}}.}$$ We will then prove that once Formula$\forall 1\leq j\leq i-1$, Formula$p_{\cdot,j}\left(x_{j}^{\ast }\right)$ reach Formula$1- {{1}\over {M}}$, with an overwhelming probability, none of them will decrease again with an overwhelming probability. Let Formula$r_{k+1}\left((1^{i-1}\ast \ast \cdots \ast 1)\right)$ be the proportion of individuals Formula$(1^{i-1}\ast \ast \cdots \ast 1)$ before selection at the Formula$(k+1)$th generation, where ∗ must be either 0 or 1. According to Chernoff bounds, and with Formula$N>M=\epsilon (n)n$, we have Formula TeX Source $$\eqalignno{& \BBP\Biggl(r_{k+1}\left((1^{i-1}\ast \ast \cdots\ast 1)\right)>(1-\delta)\left (1- {{1}\over {M}}\right)^{i}\cr& \mid p_{k,n}\left(\bar {x}_{n}^{\ast }\right)=1- {{1}\over {M}},\forall j\leq i-1\colon p_{k,j}\left(x_{j}^{\ast }\right)=1- {{1}\over {M}} \Biggr)\cr& >1-e^{-\left(1- {{1}\over {M}}\right)^{i}N\delta ^{2}/2}>1-e^{-\left(1- {{1}\over {M}}\right)^{n}N\delta^{2}/2}\cr& >1-e^{-\left(1- {{1}\over {\epsilon (n)n}}\right)^{n}\epsilon (n)n\delta ^{2}/2}\cr& \to 1-e^{-e^{-1/\epsilon (n)}\epsilon (n)n\delta ^{2}/2}}$$ which is an overwhelming probability when Formula$n\to \infty$. Since Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1-e^{{1}\over {\epsilon (n)}} {{M}\over {N}}\right)$, we know that Formula TeX Source $$\eqalignno{& r_{k+1}\left((1^{i-1}\ast \ast \cdots\ast 1)\right)\cr& >(1-\delta)\left (1- {{1}\over {M}}\right)^{i}\cr& > (1-\delta)\left (1- {{1}\over {M}}\right)^{n}> {{M}\over {N}}}$$ holds with an overwhelming probability Formula$1-e^{-e^{-1/\epsilon (n)}\epsilon (n)n\delta ^{2}/2}$. At the same time, it is obvious that the individuals Formula$(1^{i-1}\ast \ast \cdots \ast 1)$ have the highest fitness in the population. After truncation selection, according to Lemma 5, we obtain that (note that we use margins for the marginal probabilities) Formula TeX Source $$\eqalignno{& \BBP\Biggl(\forall j\leq i-1\colon p_{k+1,j}\left(x_{j}^{\ast }\right)=1- {{1}\over {M}}\mid p_{k,n}\left(\bar {x}_{n}^{\ast }\right)=1- {{1}\over {M}}, \cr& \quad \quad \forall j\leq i-1\colon p_{k,j}\left(x_{j}^{\ast }\right)=1- {{1}\over {M}} \Biggr) \cr& >1-e^{-e^{-1/\epsilon (n)}\epsilon (n)n\delta ^{2}/2}& \hbox{(38)}}$$ which means with an overwhelming probability, the marginal probabilities Formula$p_{\cdot,j}\left(x_{j}^{\ast }\right) (\forall j\leq i-1)$ will no longer change once they reach Formula$1- {{1}\over {M}}$.

Now we consider the Formula$(i+1)$th stage Formula$(i\leq n-1)$, at which the Formula$i$th bits of individuals are of our interest. Similar to the case of the 1st stage, in which the marginal probability Formula${\mathhat {p}}_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ is investigated, we can estimate the time that Formula${\mathhat {p}}_{\cdot,i}\left(x_{i}^{\ast }\right)$ reaches Formula$1- {{1}\over {M}}$, i.e., Formula${\mathhat {t}}_{i} (1\leq i< n)$. As presented in Table IX, it is not hard to obtain (34) and (35).

In order to obtain Formula${\mathhat {t}}_{i}$. we need to know Formula${\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)$ so as to solve (34) and (35). It is worth noting that Formula${\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)$ is related to the genetic drift. Similar to what we did in Section IV, when the bits are not exposed to selection pressure, given that Formula${\mathhat {t}}_{i-1}=O(n)$, the marginal probability Formula${\mathhat {p}}_{\cdot,i}\left(x_{i}^{\ast }\right)$ will remain to be as Formula${{1}\over {e}}$.8 Hence, we have Formula$p_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)> {{1}\over {e}}$ holds with the overwhelming probability of Formula TeX Source $$\prod _{k=0}^{i-1}\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{t}}_{k}} \eqno {\hbox{(39)}}$$ where the item Formula TeX Source $$\left (1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{t}}_{k}}$$ represents the probability that the Formula$(k+1)$th marginal probability is at least Formula${{1}\over {e}}$ after genetic drift. Detailed analysis can be found in the proof of Theorem 2.

Table 9
TABLE IX CALCULATION OF (34) and (35)

Now we can solve the equations given in (34) and (35), and get Formula TeX Source $$\eqalignno{& \quad {\mathhat {t}}_{i}= {\mathhat {t}}_{0}+\displaystyle \sum _{k=1}^{i}({\mathhat {t}}_{k}- {\mathhat{t}}_{k-1}) \cr& < {{(i+1)\left (\ln{{e(M-1)}\over {N}}-\ln (1-\delta)+ {{1}\over {\epsilon (n)}}\right)}\over {\ln(1-\delta)+\ln \left({{N}\over {M}}\right)- {{1}\over {\epsilon (n)}}}} \cr& \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad +2(i+1)& \hbox{(40)}}$$ where Formula$i\leq n-1$ holds.

Next, we need to estimate the joint probability that the random variable Formula$t_{i}$ is upper bounded by Formula${\mathhat {t}}_{i}$. Since similar work has been done in (32) and (33), and (20) in the proof of Theorem 2, we only informally describe it here for the sake of brevity. This joint probability contains four parts.

  1. The probability that Formula$\forall k\in \{1,\ldots,i-1\}\colon t_{k}< {\mathhat {t}}_{k}$. (It can be obtained by induction. For more details, please refer to (20).)
  2. The probability that after genetic drift of the Formula$i$th bit, the marginal probability Formula$p_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)$ is larger than Formula${{1}\over {e}}$. (We have already mentioned it in (39).)
  3. The probability that after the marginal probabilities Formula$p_{\cdot,j}\left(x_{j}^{\ast }\right) (j\ne n)$ have reached Formula$1- {{1}\over {M}}$, they will never drop to a lower level again. (We can utilize the result given in (38).)
  4. The probability that Formula$p_{t,i}\left(x_{i}^{\ast }\right)$ is lower bounded by Formula${\mathhat {p}}_{t,i}\left(x_{i}^{\ast }\right) ({\mathhat {t}}_{i-1}< t\leq {\mathhat {t}}_{i})$, given the condition that Formula$p_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)\geq {\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)$.

Now we briefly estimate the probability mentioned in Item 4 (and a more detailed example can be found in Table III in the proof of Theorem 2). As the first step, we consider the relation between Formula$p_{t,i}\left(x_{i}^{\ast }\right)$ and Formula$p_{t-1,i}\left(x_{i}^{\ast }\right) ({\mathhat {t}}_{i-1}< t\leq {\mathhat {t}}_{i})$ by applying Chernoff bounds twice. As a result, we obtain the inequalities presented in Table X, where we utilize “Formula$\min$” to take into account the situation in which Formula$(1-\delta) {{N}\over {M}}p_{t-1,i}\left(x_{i}^{\ast }\right)p_{t-1,n}\left(\bar {x}_{n}^{\ast }\right)\prod _{j=1}^{i-1}p_{t-1,j}\left(x_{j}^{\ast }\right)>1- {{1}\over {M}}$ holds. In this case, noting that the UMDA has adopted margins, the marginal probability Formula$p_{t,i}\left(x_{i}^{\ast }\right)$ is set to be Formula$1- {{1}\over {M}}$. By setting the condition of the above probability as Formula$p_{t-1,i}\left(x_{i}^{\ast }\right)\geq {\mathhat {p}}_{t-1,i}\left(x_{i}^{\ast }\right)= G_{2}^{t- {\mathhat {t}}_{i-1}-1} {\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)$, the above inequality further implies that Formula TeX Source $$\eqalignno{& \quad \BBP\biggl(p_{t,i}\left(x_{i}^{\ast }\right)\geq\min \left\{G_{2}p_{t-1,i}\left(x_{i}^{\ast }\right),1- {{1}\over {M}}\right\}\cr& \quad \quad \quad \mid p_{t-1,i}\left(x_{i}^{\ast }\right)\geq G_{2}^{t- {\mathhat {t}}_{i-1}-1} {\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)\biggr) \cr& >1-e^{-\left(1- {{1}\over {M}}\right)^{n}G_{2}^{t- {\mathhat {t}}_{i-1}-1} {\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)N\delta^{2}/2}\cr& >1-e^{-\left(1- {{1}\over {M}}\right)^{n} {\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)N\delta ^{2}/2}\cr& >1-e^{-\left(1- {{1}\over {M}}\right)^{n}N\delta ^{2}/2e}}$$ holds, where we utilize the facts that Formula${\mathhat {p}}_{{\mathhat {t}}_{i-1},i}\left(x_{i}^{\ast }\right)> {{1}\over {e}}$ holds with an overwhelming probability (the consequence of genetic drift. Original analysis can be found before (27), and Formula$G_{2}>1$ (which ensures that Formula${\mathhat {p}}_{t,i}\left(x_{i}^{\ast }\right)$ is mono-increasing when the time index Formula$t$ satisfies Formula${\mathhat {t}}_{i-1}< t\leq {\mathhat {t}}_{i}$). As a consequence of the above inequality, similar to Table III in the proof of Theorem 2, we obtain the probability mentioned in Item 4 Formula TeX Source $$\eqalignno{& \qquad \qquad \left(1-e^{-\left(1- {{1}\over {M}}\right)^{n}N\delta ^{2}/2e}\right)^{{\mathhat {t}}_{i}- {\mathhat {t}}_{i-1}}\cr& =\left(1-e^{-e^{-1/\epsilon (n)}\omega (n^{2+\alpha }\log n)\delta ^{2}/2e}\right)^{{\mathhat {t}}_{i}- {\mathhat {t}}_{i-1}}. }$$ Now combining the probabilities mentioned in Items 1, 2, 3 and 4 together, we can obtain that Formula$t_{i}$ is upper bounded by Formula${\mathhat {t}}_{i}$ at least with the probability of Formula TeX Source $$\eqalignno{& \left(1-n^{-e^{-1/\epsilon (n)}\omega (n^{2+\alpha })\delta ^{2}/2e}\right)^{2 {\mathhat {t}}_{i}}\cr& \quad \cdot \prod _{k=0}^{i-1}\left(1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{t}}_{k}}.}$$ As a result, Formula$t_{n-1}$ is bounded by Formula${\mathhat {t}}_{n-1}$ with the overwhelming probability of Formula TeX Source $$\left(1-n^{-e^{-1/\epsilon (n)}\omega (n^{2+\alpha })\delta ^{2}/2e}\right)^{2 {\mathhat {t}}_{n-1}}\cdot\prod _{k=0}^{n-2}\left(1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2 {\mathhat{t}}_{k}}.$$

Table 10
TABLE X BOUNDING Formula$p_{t,i}\left(x_{i}^{\ast }\right)$ FROM BELOW WITH AN OVERWHELMING PROBABILITY

When all the marginal probabilities Formula$p_{\cdot,i}\left(x_{i}^{\ast }\right) (i\ne n)$ have reached Formula$1- {{1}\over {M}}$, the marginal probability Formula$p_{\cdot,n}\left(\bar {x}_{n}^{\ast }\right)$ will become smaller and smaller and the probability of finding the optimum becomes larger and larger.

Now we consider the Formula$(n+1)$th stage, in which two events hold: 1) Formula${\mathhat {p}}_{{\mathhat {t}}_{n-1},n}\left(x_{n}^{\ast }\right)\geq {{1}\over {M}}$ holds; 2) Formula$\forall t> {\mathhat {t}}_{n-1}$, Formula$t\prec Poly(n)$, Formula$\forall j\leq n-1\colon p_{t,j}\left(x_{j}^{\ast }\right)=1- {{1}\over {M}}$ holds with an overwhelming probability (38). Thus, there is no genetic drift to be taken into account. Meanwhile, the probability of generating the optimum in one sampling of a generation, conditional on the above two events, is at least Formula$\left(1- {{1}\over {M}}\right)^{n-1} {{1}\over {M}}=e^{-(n-1)/n\epsilon (n)} {{1}\over {M}}$, which implies that if the above two events both happen (which is true in the Formula$(n+1)$th stage), then the optimum will be found within Formula$M\ln ^{2}n$ extra samplings (which generates Formula$M\ln ^{2}n$ new individuals) with the overwhelming probability Formula$1-\left({{1}\over {e}}\right)^{\omega (\ln n)}$. Consequently, after the first Formula$n$ stages, at most Formula${{M}\over {N}}\ln ^{2}n$ generations can guarantee the emergence of the optimum with an overwhelming probability.

Hence, the first hitting time Formula$\tau$ is upper bounded by a deterministic value Formula$\bar {\tau }$ Formula TeX Source $$\eqalignno{& \tau < \bar {\tau }= {{\left(\ln{{e(M-1)}\over {N}}-\ln (1-\delta)\right)n\epsilon (n)+n}\over {\epsilon (n)\ln (1-\delta)+\epsilon (n)\ln \left({{N}\over {M}}\right)-1}}\cr& \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad + {{M}\over{N}}\ln ^{2}n+2n}$$ with the overwhelming probability at least Formula TeX Source $$\eqalignno{& \left(1-n^{-e^{-1/\epsilon (n)}\omega (n^{2+\alpha })\delta ^{2}/2e}\right)^{2\bar {\tau}}\cr& \quad \cdot \left(1-n^{-\left(1-\left({{1}\over {n}}\right)^{1+ {{\alpha }\over {2}}}\right)^{2}\omega (1)}\right)^{2(n-1)\bar {\tau}}\cr& \quad \cdot \left (1-\left({{1}\over {e}}\right)^{\omega (\ln n)}\right).}$$

The results in this section show that margins can avoid misleading convergence and leave some chances to the Formula${\rm UMDA}_{M}$ to find the global optimum. However, Formula${\rm UMDA}_{M}$ cannot converge to the global optimum completely anymore, i.e., the CT becomes infinite. This is an interesting case where the FHT is bounded polynomially in the problem size, but the CT is infinite, and it demonstrates that FHT is a more appropriate measure for EDAs time complexity than CT. It is noteworthy that the idea of margins is quite similar to the Laplace correction [2], which was also proposed to prevent the marginal probabilities from premature convergence. However, since our purpose here is to demonstrate the influence of forbidding a marginal probability to be 0 or 1, the slight difference between relaxation and Laplace correction is not investigated.

SECTION VII

CONCLUSION

In this paper, we utilized the FHT to measure the time complexity of EDAs. Based on the FHT measure, we proposed a classification of problem hardness for EDAs and the corresponding probability conditions. This is the first time the general issues related to the time complexity of EDAs were discussed theoretically. After that, a new approach to analyzing the FHT for EDAs with finite population was introduced. Using this approach, we investigated the time complexity of UMDAs as examples. In this paper, UMDAs were analyzed in depth on two problems: LEADINGOnes [37] and BVLeadingOnes. Both of the problems are unimodal. The latter was derived from the former, and inherited the domino convergence property of the former. For the original UMDA, LEADINGOnes is shown to be EDA-easy, and BVLeadingOnes is shown to be EDA-hard. Comparing the theoretical results of EDAs with those of the EAs', although the first result is similar to EAs', i.e., LEADINGOnes is easy, it should be noted that the general case does not hold. That is, a problem that is easy for the EAs could be hard for EDAs, e.g., the BVLeadingOnes problem. However, it is still an open issue to analyze problems that are hard for the EAs but easy for the EDAs.

If the UMDA is further relaxed by margins, BVLeadingOnes will no longer be EDA-hard. Our analysis shows that the margin is helpful for UMDA to avoid wrong convergence and thus significantly increases the performance of UMDA on BVLeadingOnes. This is the first rigorous time complexity evidence that supports the efficacy of relaxations (corrections) of EDAs.

Finally, although we only analyze UMDAs, our approach has the potential for analyzing other EDAs with the finite populations. The general idea is to find a way to simplify the EDAs and then estimate the probability that this simplification holds. However, since different EDAs may have different characteristics, more work needs to be done for the generalization of our approach.

APPENDIX

Proof of Lemma 6

According to Chernoff bounds, we have Formula TeX Source $$\eqalignno{& \quad \BBP\Biggl(p_{t,n}\left(\bar {x}_{n}^{\ast }\right)\geq(1-\delta) {{p_{t-1,n}\left(\bar {x}_{n}^{\ast }\right)N}\over {M}}\cr& \quad \quad \quad \quad \quad \quad \mid p_{t-1,n}\left(\bar {x}_{n}^{\ast }\right) \leq{{M}\over {N(1-\delta)}}\Biggr)\cr& > 1- e^{-p_{t-1,n}\left(\bar {x}_{n}^{\ast }\right)N\delta ^{2}/2}, \forall t\leq U}$$ where Formula$\delta \in \left(\max \left\{0,1- {{2M}\over {N}}\right\},1- {{M}\over {N}}\right)$ is a positive constant. Since no global optimum is generated before the Formula$U$th generation, we have Formula TeX Source $${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)=G^{t}p_{0,n}\left(\bar {x}_{n}^{\ast }\right),\quad \forall t\leq U$$ where Formula$G=(1-\delta) {{N}\over {M}}$, and Formula${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)$ is deterministic given the initial value Formula$p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {{1}\over {2}}$. Furthermore, setting Formula$t=U$ in the above equation, by calculation we obtain that Formula TeX Source $${\mathhat {p}}_{U,n}\left(\bar {x}_{n}^{\ast }\right)=1.$$ Let Formula${\mathhat {T}}_{n}^{\prime}$ denote the minimal Formula$t$ for Formula${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)$ to reach 1, then the above equation implies Formula${\mathhat {T}}_{n}^{\prime}\leq U$. We study the probability that the random variable Formula$p_{t,n}\left(\bar {x}_{n}^{\ast }\right)$ is larger than Formula${\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)$. Similar to Table III, Formula$\forall t\leq {\mathhat {T}}_{n}^{\prime}$ we obtain Formula TeX Source $$\eqalignno{& \BBP\left(p_{t,n}\left(\bar {x}_{n}^{\ast }\right)\geq{\mathhat {p}}_{t,n}\left(\bar {x}_{n}^{\ast }\right)\mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right)\cr& \quad \quad >\left(1- e^{-p_{0,n}\left(\bar {x}_{n}^{\ast }\right)N\delta ^{2}/2}\right)^{t}.}$$

By inequalities in Table XI, we estimate the probability that Formula$T_{n}^{\prime}$ is bounded by Formula${\mathhat {T}}_{n}^{\prime}$, where in (42) the factor Formula$\left(1- e^{-{p_{0,n}\left(\bar {x}_{n}^{\ast }\right)N}\delta ^{2}/2}\right)$ is added since we apply Chernoff bounds at the end of the Formula$\left({\mathhat {T}}_{n}^{\prime}-1\right)$th generation. We then consider the following item: Formula TeX Source $$\eqalignno{& \BBP\left({\mathhat {p}}_{{\mathhat {T}}_{n}^{\prime }-1,n}\left(\bar {x}_{n}^{\ast }\right)> {{M}\over {N(1-\delta)}}\mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right)\cr& \quad \quad = \BBP\left({\mathhat {p}}_{{\mathhat {T}}_{n}^{\prime }-1,n}\left(\bar {x}_{n}^{\ast }\right)> {{M}\over {N(1-\delta)}}\right).}$$ According to the definition of Formula${\mathhat {T}}_{n}^{\prime}$, and noting that Formula${\mathhat {p}}_{{\mathhat {T}}_{n}^{\prime}-1,n}\left(\bar {x}_{n}^{\ast }\right)> {{M}\over {N(1-\delta)}}$ is deterministic, we know the probability above is 1. Thus, we continue to estimate the corresponding probability mentioned in (41) Formula TeX Source $$\eqalignno{& \quad \BBP\left(T_{n}^{\prime }\leq {\mathhat{T}}_{n}^{\prime } \mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)={\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right)\cr& >\BBP\biggl(p_{{\mathhat {T}}_{n}^{\prime }-1,n}\left(\bar{x}_{n}^{\ast }\right)\geq {\mathhat {p}}_{{\mathhat{T}}_{n}^{\prime }-1,n}\left(\bar {x}_{n}^{\ast }\right)\cr& \mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat{p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\biggr)\left(1-e^{- {{{\mathhat {p}}_{0,n}(1)N}\over {2}}\delta ^{2}}\right)\cr&>\left(1-e^{- {{{\mathhat {p}}_{0,n}(1)N}\over {2}}\delta^{2}}\right)^{{\mathhat {T}}_{n}^{\prime}}.}$$ Since Formula${\mathhat {T}}_{n}^{\prime}\leq U$, we further getFormulaTeX Source $$\eqalignno{& \BBP\left(T_{n}^{\prime }\leq U \mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right)\cr& > \BBP\left(T_{n}^{\prime }\leq {\mathhat {T}}_{n}^{\prime } \mid p_{0,n}\left(\bar {x}_{n}^{\ast }\right)= {\mathhat {p}}_{0,n}\left(\bar {x}_{n}^{\ast }\right)\right)\cr& >\left(1- e^{- {{{\mathhat {p}}_{0,n}(1)N}\over {2}}\delta ^{2}}\right)^{U}.}$$ The analysis above tells us, the probability that the marginal probability converges before the Formula$U$th generation Formula$(T_{n}< U)$ is at least Formula$\left(1- e^{- {{N}\over {4}}\delta ^{2}}\right)^{U}$. Since Formula$N=\omega (n^{2+\alpha }\log n)$, Formula$M=\beta N$(Formula$\beta \in (0,1)$ is a constant) and Formula$U$ is polynomial in the problem size Formula$n$, this probability is overwhelming. Hence, we have proven the lemma.

Table 11
TABLE XI CALCULATION OF PROBABILITY THAT Formula$T_{n}^{\prime}$ IS UPPER BOUNDED BY Formula${\mathhat {T}}_{n}^{\prime}$

ACKNOWLEDGMENT

The authors are grateful to Prof. J. A. Lozano for his constructive comments. T. Chen would like to thank Dr. J. He for his kind helps and suggestions over the years.

Footnotes

This work was supported in part by the National Natural Science Foundation of China under Grants 60533020 and U0835002, the Fund for Foreign Scholars in the University Research and Teaching Programs (111 Project) in China under Grant B07033, and an Engineering and Physical Science Research Council Grant EP/C520696/1 in the U.K.

X. Yao is with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China, and also with the Center of Excellence for Research in Computational Intelligence and Applications, School of Computer Science, University of Birmingham, Edgbaston, Birmingham B15 2TT, U.K.(e-mail: x.yao@cs.bham.ac.uk).

T. Chen, K. Tang and G. Chen are with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui 230027, China (e-mail: cetacy@mail.ustc.edu.cn, ketang@ustc.edu.cn, glchen@ustc.edu.cn).

1For Formula$g(n)\in [{0,1}]$, there are more detailed asymptotic orders in the interval Formula$[{0,1}]$:

1) Formula$g(n)\prec {{1}\over {SuperPoly(n)}}$;

2) Formula${{1}\over {Poly(n)}}\prec g(n)\prec 1- {{1}\over {Poly(n)}}$ [if and only if Formula$\exists a_{1},b_{1},a_{2},b_{2}\in \BBR ^{+}$, Formula$n_{0},n_{1}\in \BBN$: Formula$\forall n>\max \{n_{0},n_{1}\}$, Formula$1/\left(a_{1}n^{b_{1}}\right)\leq g(n)\leq 1-1/\left(a_{2}n^{b_{2}}\right)$];

3) Formula$g(n)\succ 1- {{1}\over {SuperPoly(n)}}$ [if and only if Formula$\forall a,b\in \BBR ^{+}$: Formula$\exists n_{0}\in \BBN$: Formula$\forall n>n_{0}$, Formula$g(n)\geq 1-1/(an^{b})$].

If necessary, these detailed asymptotic orders can be obtained by considering the regions Formula$c\pm {{1}\over {Poly(n)}}$ and Formula$c\pm {{1}\over {SuperPoly(n)}}$, where Formula$0< c< 1$.

2In our discussions, “deterministic” is always in the sense that we have fixed the initial values of all the parameters of the non-self-adaptive EDA.

3The first inequality can be found in [38, Corollary 1.1], or a similar form can be found in [21], and the second inequality is in [38, (3.3)].

4Given the values of the population sizes and the constant Formula$\delta$, the value of Formula$\bar {\tau }$ is then determined by the problem size Formula$n$. Thus, Formula$\bar {\tau }$ is not a random variable.

5The notation “Formula$[\quad ]$” can be interpreted as follows: given Formula$a>1$, Formula$[a]=1$; given Formula$a\in (0,1)$, Formula$[a]=a$. For the sake of brevity, we will omit this notation but implicitly restrict the value of a probability not to exceed 1 in the following parts of the paper.

6When there is no selection pressure, the proportion of alleles in a population with finite genes will fluctuate due to the errors brought by random sampling. For more details, one can refer to [6], [41].

7For the sake of brevity, we assume that Formula$\log ^{2} n$ is an integer and thus omit the notation “Formula$\lceil \quad \rceil$.”

8For the sake of brevity, we write the results of different stages together. It is noteworthy that the proof here contains no loop, since we can prove the result for different values of Formula$i$ (Formula$i=1,\ldots,n-1$ is the index of bits) one after another as we have done in Theorem 2. Similar to the case of Theorem 2, since Formula$\forall i=1,\ldots,n-1$, Formula${\mathhat {t}}_{i}- {\mathhat {t}}_{i-1}=\Theta (1)$, the sum of at most Formula$i$ such items [see (40)] is always Formula$O(n)$, and the impact of genetic drift can be estimated as we have done in Theorem 2 for the Formula$(i+1)$th bit: at least a level of Formula$1/e$ can be maintained with an overwhelming probability.

References

No Data Available

Authors

Tianshi Chen

Tianshi Chen

Tianshi Chen (S'07) received the B.S. degree in mathematics from the Special Class for the Gifted Young, University of Science and Technology of China (USTC), Hefei, Anhui, China, in 2005. He is currently working toward the Ph.D. degree in computer science from the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, USTC.

His research interests include theoretical aspects of evolutionary algorithms, various real-world applications of evolutionary algorithms, and theoretical aspects of parallel computing.

Ke Tang

Ke Tang

Ke Tang (S'05–M'07) received the B.E. degree from the Department of Control Science and Engineering, Huazhong University of Science and Technology, Wuhan, China, in 2002, and the Ph.D. degree from the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, in 2007.

Since 2007, he has been an Associate Professor with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, University of Science and Technology of China, Hefei, Anhui, China. He is the coauthor of more than 30 refereed publications. His major research interests include machine learning, pattern analysis, evolutionary computation, data mining, metaheuristic algorithms, and real-world applications.

Dr. Tang is an Editorial Board Member of three international journals and the Chair of the IEEE Task Force on Large Scale Global Optimization.

Guoliang Chen

Guoliang Chen

Guoliang Chen received the B.S. degree from Xian Jiaotong University, Xian, China, in 1961.

Since 1973, he has been with the University of Science and Technology of China, Hefei, Anhui, China, where he is currently the Academic Committee Chair of the Nature Inspired Computation and Applications Laboratory, a Professor with the School of Computer Science and Technology, and the Director of the School of Software Engineering. From 1981 to 1983, he was a Visiting Scholar with Purdue University, West Lafayette, IN. He is currently also the Director of the National High Performance Computing Center, Hefei, Anhui, China. He has published nine books and more than 200 research papers. His research interests include parallel algorithms, computer architecture, computer networks, and computational intelligence.

Prof. Chen is an Academician of the Chinese Academy of Sciences. He was the recipient of the National Excellent Teaching Award of China in 2003.

Xin Yao

Xin Yao

Xin Yao (M'91–SM'96–F'03) received the B.S. degree from the University of Science and Technology of China (USTC), Hefei, Anhui, China, in 1982, the M.S. degree from the North China Institute of Computing Technology, Beijing, China, in 1985, and the Ph.D. degree from USTC, in 1990, all in computer science.

From 1985 to 1990, he was an Associate Lecturer and Lecturer with USTC, while working toward the Ph.D. degree in simulated annealing and evolutionary algorithms. In 1990, he was a Postdoctoral Fellow with the Computer Sciences Laboratory, Australian National University, Canberra, Australia, where he continued his work on simulated annealing and evolutionary algorithms. In 1991, he was with the Knowledge-Based Systems Group, Commonwealth Scientific and Industrial Research Organization Division of Building, Construction and Engineering, Melbourne, Australia, where he worked primarily on an industrial project on automatic inspection of sewage pipes. In 1992, he returned to Canberra to take up a Lectureship with the School of Computer Science, University College, University of New South Wales, Australian Defense Force Academy, Sydney, Australia, where he was later promoted to a Senior Lecturer and Associate Professor. Attracted by the English weather, he moved to the University of Birmingham, Edgbaston, Birmingham, U.K., where he became a Professor (Chair) of computer science on April 1, 1999. He is currently the Director of the Center of Excellence for Research in Computational Intelligence and Applications, School of Computer Science, University of Birmingham. He is currently also a Changjiang (Visiting) Chair Professor (Cheung Kong Scholar) with the Nature Inspired Computation and Applications Laboratory, School of Computer Science and Technology, USTC. He has given more than 50 invited keynote and plenary speeches at conferences and workshops worldwide. He has more than 300 refereed publications. His major research interests include evolutionary artificial neural networks, automatic modularization of machine-learning systems, evolutionary optimization, constraint-handling techniques, computational time complexity of evolutionary algorithms, coevolution, iterated prisoner's dilemma, data mining, and real-world applications.

Dr. Yao was the Editor-in-Chief of the IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION from 2003 to 2008, an Associate Editor or Editorial Board Member of 12 other journals, and the Editor of the World Scientific Book Series on Advances in Natural Computation. He was the recipient of the President's Award for Outstanding Thesis by the Chinese Academy of Sciences for his Ph.D. work on simulated annealing and evolutionary algorithms in 1989. He was the recipient of the 2001 IEEE Donald G. Fink Prize Paper Award for his work on evolutionary artificial neural networks.

Cited By

No Data Available

Keywords

Corrections

None

Multimedia

No Data Available
This paper appears in:
No Data Available
Issue Date:
No Data Available
On page(s):
No Data Available
ISSN:
None
INSPEC Accession Number:
None
Digital Object Identifier:
None
Date of Current Version:
No Data Available
Date of Original Publication:
No Data Available

Text Size