Analysis of Evolutionary Algorithms on Fitness Function with Time-linkage Property

In real-world applications, many optimization problems have the time-linkage property, that is, the objective function value relies on the current solution as well as the historical solutions. Although the rigorous theoretical analysis on evolutionary algorithms has rapidly developed in recent two decades, it remains an open problem to theoretically understand the behaviors of evolutionary algorithms on time-linkage problems. This paper takes the first step to rigorously analyze evolutionary algorithms for time-linkage functions. Based on the basic OneMax function, we propose a time-linkage function where the first bit value of the last time step is integrated but has a different preference from the current first bit. We prove that with probability $1-o(1)$, randomized local search and $(1+1)$ EA cannot find the optimum, and with probability $1-o(1)$, $(\mu+1)$ EA is able to reach the optimum.


Introduction
Evolutionary Algorithms (EAs), one category of stochastic optimization algorithms that are inspired by Darwinian principle and natural selection, have been widely utilized in real-world applications.Although EAs are simple and efficient to use, the theoretical understandings on the working principles and complexity of EAs are much more complicated and far behind the practical usage due to the difficulty of mathematical analysis caused by their stochastic and iterative process.
In order to fundamentally understand EAs and ultimately design efficient algorithms in practice, researchers begin the rigorous analysis on functions with simple and clear structure, majorly like pseudo-Boolean function (e.g.OneMax, LeadingOnes, BinaryValue, etc..) and classic combinatorial optimization problem (e.g.minimum spanning tree problem).Despite the increasing attention and insightful theoretical analyses in recent decades, there remain many important open areas that haven't been considered in the evolutionary theory community.
One important open issue is about the time-linkage problem.Time-linkage problem is the optimization problem where the objective function to be optimized relies not only on the solutions of the current time but also the historical ones.In other words, the current decisions also influence the future.There are plenty of applications with time-linkage property.For example, temporal credit assignment for reinforcement learning, a dynamic optimal car navigation based on the real-time traffic information, an optimal watering scheduling to improve the quality of the crops along with the weather change, optimal land-use decisions of the farmers according to the economic and natural environment, optimal configuration of the supply chain satisfying the dynamic customer's demand, etc.
The time-linkage optimization problems can be tackled offline or online according to different situations.If the problem pursues an overall solution with sufficient time budget and time-linkage dynamics can be integrated into a static objective function, then the problem can be solved offline.However, in the theoretical understanding on the static problem, no matter OneMax, LeadingOnes and other widely analyzed Pseudo-Boolean functions [DJW02] or Minimum Spanning Tree, Eulerian Cycle and other widely analyzed combinatorial problems [NW10], no static benchmark function in evolutionary theory community is time-linkage, to the best of our knowledge.
Another situation that the real-world applications often encounter is that the solution must be solved online as the time goes by.This time-linkage online problem belongs to dynamic optimization problem [Ngu11].As in the survey of [Ngu11], the whole evolutionary community, not only the evolutionary theory community, is lack of researches on this real-world problems.We further investigate the theoretical research about the dynamic optimization problem.The dynamic problem analyzed in the theory community majorly includes Dynamic OneMax [Dro02], Magnitude and Balance [RLY09], Maze [KM12] and Bi-stable problem [JZ15] for dynamic pseudoboolean function, and dynamic combinatorial problems including single-destination shortest path problem [LW15], makespan scheduling [NW15], vertex cover problem [PGN15], subset selection [RNNF19], graph coloring [BNPS19] and etc.To the best of our knowledge, there is no theoretical analysis on dynamic time-linkage fitness function, even no dynamic time-linkage pseudo-Boolean function is proposed for the theoretical analysis.
The main contributions of this paper can be summarized as follows.This paper conducts the first step towards the understanding of EAs on time-linkage function.When solving a timelinkage problem by EAs in an offline mode, the first thing faced by the practitioners to utilize EAs is how they encode the solution.There are obviously two straightforward encoding ways.Take the objective function relying on solutions of two time steps as an example.One way is to merely ignore the time-linkage dependency by solving a non-time-linkage function with double problem size.The other way is to consider the time-linkage dependency, encode the solution with the original problem size, but store the solutions generated in the previous time steps for the fitness evaluation.The researchers and the practitioners will question whether the encoding really matter.When solving the time-linkage problem in an online mode, the engineers need to know before they conduct the experiments whether the algorithm they use can solve the problem or not.Hence, in this paper, we design a time-linkage toy function based on OneMax to shed some light on these questions.This function, called OneMax (0,1 n ) where n is the dimension size, is the sum of two components, one is OneMax fitness of the current n-dimensional solution, the other one is the value of the first dimension in the previous solution but multiplying minus dimension size.The design of this function considers the situation when the current solution prefers opposite value different from the previous solution, which could better show the influence of different encodings.Also, it could be the core element of some dynamic time-linkage functions and used in the situation that each time step we only optimize the current state of the online problem in a limited time, so that the analysis of this function could also show some insights to the undiscovered theory for the dynamic time-linkage function.
The remainder of this paper is organized as following.In Section 2, we introduce the motivation and details about the designed OneMax (0,1 n ) .Section 3 shows the theoretical results of RLS and (1 + 1) EA on OneMax (0,1 n ) , and our theoretical results of (µ + 1) EA is shown in Section 4. Our conclusion is summarized in Section 5.

OneMax
For the first time-linkage problem for theoretical analysis, we expect the function to be simple and with clear structure.OneMax that counting the total ones in a bit string is considered as one of the simplest pseudo-boolean functions, and is a well-understood benchmark in the evolutionary theory community on static problems.Choosing it as a base function to add the time-linkage property could facilitate the theoretical understanding on the time-linkage property.Hence, the time-linkage function we discussed in this paper is based on OneMax.In OneMax function, each dimension has the same importance and the same preference to having a dimension value 1.We would like to show the difference, or more aggressively show the difficulty that the time-linkage property will cause, which could better help us understand the behavior of EAs on time-linkage problems.Therefore, we will introduce the solutions of the previous steps but with different importance and preference.For simplicity of analysis, we only introduce one dimension, let's say the first dimension, value of the last time step into the objective function but with the weight of −n, where n is the dimension size.More specifically, this function f : {0, 1} × {0, 1} n → Z is defined by for two consecutive x g−1 = (x g−1 1 , . . ., x g−1 n ) and x g = (x g 1 , . . ., x g n ) ∈ {0, 1} n .Clearly, (1) consists of two components, OneMax component relying on the current individual, and the drawing-back component determined by the first bit value of the previous individual.If our goal is to maximize (1), it is not difficult to see that the optimum is unique and the maximum value n is reached if and only if (x g−1 1 , x g ) = (0, 1 n ).Hence, we integrate (0, 1 n ) and call (1) OneMax (0,1 n ) function.

Some Notes
Time-linkage optimization problem can be solved offline or online due to different situations.If we solve the designed OneMax (0,1 n ) function in an offline mode, the optimal solution that maximizes (1) is (x g−1 1 , x g ) = (0, 1 n ).In this case, two straightforward representations need to be considered, one is ignoring the time-linkage fact and encoding a (n + 1)-bit string as one solution since we just require one bit value from previous time steps, and the other one is encoding a n-bit string as one solution and storing the previous results for objective function evaluation.It is easy to see that for the first kind of representation, the algorithms ((1 + 1) EA and (µ+ 1)EA we consider in the following sections) will not encounter any stagnation and surly to solve the OneMax (0,1 n ) function.This kind of problem is straightly classified as the traditional non-timelinkage linear function, and thus is not of interest for our topic.The other representation is more interesting to us since OneMax (0,1 n ) function is truly a time-linkage function and we could figure out how EAs react to the time-linkage property.Hence, the later sections only consider the second representation when OneMax (0,1 n ) function is analyzed.
If we relate OneMax (0,1 n ) function to online dynamic optimization problems, we could also regard it as one piece of the objective function that considers the overall results during a given time period and each time step we only optimize the current piece.For example, we consider the following dynamic online problem where x = {x 2 , . . ., x g }, x t = (x t 1 , . . ., x t n ) ∈ {0, 1} n for t = 0, 1, . . ., g and the initial x 0 and x 1 are given.The goal is to find the time step g when (2) has the value greater than n − 1.Since the previous elements in 1, . . ., g−2 time steps can contribute at most g t=2 exp(−g+t−1) ≤ 1/(e−1) value, the goal can be transferred to find the time step when the component of the current and the last step, that is, OneMax (0,1 n ) has the value of n.Thus if we take the strategy for online optimization that we optimize the present each time as discussed in [Bos05, Section 3], that is, for the current time g cur , we optimize h(x, g cur ) with knowing x 0 , . . ., x gcur −1 , then the problem can be functionally regarded as maximizing OneMax (0,1 n ) function as time goes by.
In conclusion, we note that for the representation encoding n-bit string in an offline manner and for optimizing present in an online dynamic manner, the algorithm used for these two situations are the same but with different backgrounds and descriptions of the operators.The details will be discussed when they are mentioned in Sections 3 and 4.

RLS and
(1 + 1) EA Cannot Find the Optimum 3.1 RLS and (1 + 1) EA Utilized for OneMax (0,1 n ) (1 + 1) EA is the most simple EA that is frequently analyzed as a benchmark algorithm in the evolutionary theory community, and randomized local search (RLS) can be regarded as the simplification of (1 + 1) EA and thus a pre-step towards the theoretical understanding of (1 + 1) EA.Both algorithms are only with one individual in their population.Their difference is on the mutation.In each generation, (1 + 1) EA employs a bit-wise mutation on the individual, that is, each bit is independently flipped with probability 1/n, where n is the problem size, while RLS employs one-bit mutation, that is, only one bit among n bits is uniformly and randomly chosen to be flipped.For both algorithms, the generated offspring will replace its parent as long as it has at least the same fitness as its parent.
The general RLS and (1 + 1) EA are utilized for non-time-linkage function, and they do not consider how we choose the individual representation and do not consider the requirement to to make a decision in a short time.We need some small modifications on RLS and (1 + 1) EA to handle the time-linkage OneMax (0,1 n ) function.The first issue, the representation choice, only happens when the problem is solved in an offline mode.As mentioned in Section 2.2, for the two representation options, we only consider the one that encodes the current solution and stores the historical solutions for fitness evaluation.Algorithm 1 and Algorithm 2 respectively show our modified (1 + 1) EA and RLS for solving OneMax (0,1 n ) , and we shall still use the name (1 + 1) EA and RLS in this paper with no confusion.In this case, the goal of OneMax (0,1 n ) is to find the 1 n as the current solution with the stored first bit value of the last generation being 0. Practically, some termination criterion is utilized in the algorithms when the practical requirement is met.Since we aim at theoretically analyzing the time to reach the optimum, we do not set the termination criterion here.One may note that the notation of the individual has a subscript "1", which is to denote that there is only one individual in each generation, and is also for the consistency with the notations in our later discussed (µ + 1) EA in Section 4.
Algorithm 1 (1 + 1) EA to maximize fitness function f requiring two consecutive time steps Generate Xg via independently flipping each bit value of X g 1 with probability 1/n %% Selection 4: , X g 1 ) has the better fitness; (X g 1 , Xg ), if (X g 1 , Xg ) has better or as good fitness as (X g−1 1 , X g 1 ).
5: end for Algorithm 2 RLS to maximize fitness function f requiring two consecutive time steps Uniformly and randomly select one index i from {1, . . ., n}

4:
Generate Xg via flipping i-th bit value of X g 1 %% Selection 5: , X g 1 ) has the better fitness; (X g 1 , Xg ), if (X g 1 , Xg ) has better or as good fitness as (X g−1 1 , X g 1 ).
6: end for The second issue, the requirement to make a decision in a short time, happens when the problem is solved in an online mode.Detailedly, consider the case we discussed in Section 2.2 that in each time step we just optimize the present.If the time to make the decision is not so small that (1 + 1) EA or RLS can solve the n-dimension problem (OneMax function), then we could obtain X t 1 = (1, . . ., 1) in each time step t.Obviously, in this case, the sequence of {X 1 1 , . . ., X t 1 } we obtained will lead to a fitness less than 1 for any time step t, and thus we cannot achieve our goal and this case is not interesting.If the time to make the decision is small so that we cannot solve n-dimensional OneMax function, we can just expect to find some result with better fitness value each time step.That is, we utilize (1 + 1) EA or RLS to solve OneMax (0,1 n ) function, and the evolution process can go on only if some offspring with better fitness appears.In this case, we can reuse Algorithm 1, but to note that the generation step g need not to be the same as the time step t of the fitness function since (1 + 1) EA or RLS may need more than one generations to obtain an offspring with better fitness for one time step.
In a word, no matter utilizing (1 + 1) EA and RLS to solve OneMax (0,1 n ) offline or online, in the theoretical analysis, we only consider Algorithm 1 and Algorithm 2 without mentioning the solving mode and regardless of the explanations of the different backgrounds.

Convergence Analysis of RLS and
(1 + 1) EA on OneMax (0,1 n ) This subsection will show that with high probability RLS and (1+1) EA cannot find the optimum of OneMax (0,1 n ) .Obviously, OneMax (0,1 n ) has two goals to achieve, one is to find all 1s in the current string, and the other is to find the optimal pattern (0, 1) in the first bit that the current first bit value goes to 1 when the previous first bit value is 0. The two goals are somehow contradictory, so that only one individual in the population of RLS and (1 + 1) EA will cause poor fault tolerance.Detailedly, as we will show in the proof of Theorem 2, the population cannot be further improved once either one goal is achieved before the optimum is found.Before establishing this result, we first discuss the probability estimate of the event that the increase number of ones from one parent individual to its offspring is 1 under the condition that the increase number of ones is positive in one iteration of (1 + 1) EA, given the parent individual has a zeros.
we have the lemma is proved.
Now we are ready to show the behavior for RLS and (1 + 1) EA optimizing OneMax (0,1 n ) function.
Proof.The following proof doesn't specifically distinguish RLS and (1 + 1) EA due to their similarity, and discusses each algorithm independently only when they have different behaviors.
At the beginning, we point out that there are two cases once one of them happens before the optimum is reached, both RLS and (1 + 1) EA cannot find the optimum in arbitrary further generations.
The other case (Event II ) is that the current individual has reached the optimum of the OneMax component while the previous first bit value has lost, that is, for some generation g 0 , (X g0−1 1,1 , X g0 1 ) = (1, 1 n ).In this case, current fitness f (X g0−1 1 , X g0 1 ) = 0. Similar to the above case, since X g0 1,1 = 1, all possible mutation outcome Xg0 along with X g0 1 will have fitness less than or equal to 0. Hence, only when f (X g0 1 , Xg0 ) = 0 happens, Xg0 can enter into the next generation, which means Xg0 = 1 n .Therefore, RLS or (1 + 1) EA will get stuck in this case.Now it remains to show that with high probability, from the random initial individuals, one of two cases will happen.For the uniformly and randomly generated 1-st generation, we have ).Hence, with probability at least 1 − exp(−(n − 1)/8), the remaining bits have at least n/4 0s, and we consider this initial status in the following.
If (X 0 1,1 , X 1 1,1 ) = (0, 1), Event I already happens.If (X 0 1,1 , X 1 1,1 ) = (0, 0), we consider the subsequent process once the number of 0-bits among the remaining n−1 bit positions of the current individual, becomes less than n c for some constant c < 0.5.Note that if the first bit value changes from 0 to 1 before the number of 0-bits decreases to n c amount, Event I already happens.Hence, we just consider the case that the current first bit value is still 0 at the first time when the number of the remaining 0-bits decreases to n c .Let a denote the number of 0-bits of the current individual.We will show that in the subsequent generations, with probability at least 1 − o(1), (0, 1) pattern of the first bit will be detected before the remaining bits reach the optimal 1 n−1 .We conduct the proof based on the following two facts.
• Among increase steps (the fitness has an absolute increase), a single increase step increases the fitness by 1 with conditional probability at least 1 − ea/n.For RLS, due to its one-bit mutation, the amount of fitness increase can only be 1 for a single increase step.For (1 + 1) EA, Lemma 1 directly shows this fact.
• Under the condition that one step increases the fitness by 1, with conditional probability at least 1/a, the first bit changes its value from 0 to 1.It is obvious for RLS.For (1 + 1) EA, suppose that the number of bits changing from 0 to 1 in this step is m ∈ [1..a], then the probability that the first bit contribute one 0 is Note that there are a − 1 increase steps before the remaining n − 1 positions become all 1s if each increase step increases the fitness by 1. Then with the above two facts, it is easy to know that the probability that (0, 1) pattern is detected before remaining positions all have bit value 1 is at least If (X 0 1,1 , X 1 1,1 ) = (1, 0), we know that any offspring will have better fitness than the current individual, and will surly enter into the next generation.Then with probability 1/n, the first bit value in the next generation becomes 1, that is, Event I happens.Otherwise, with probability 1 − 1/n, it turns to the above discussed (X 0 1,1 , X 1 1,1 ) = (0, 0) situation.Hence, in this situation, the probability that eventually Event I happens is at least , since in each iteration only one bit can be flipped for RLS, once the first bit is flipped from 1 to 0, the fitness of the offspring will be less than its parent and the offspring cannot enter into the next generation.Hence, for RLS, the individual will eventually evolved to (X g0−1 1,1 , X g0 1 ) = (1, 1 n ) for some g 0 ∈ N.That is, Event II happens.For (1 + 1) EA, similar to the (X 0 1,1 , X 1 1,1 ) = (0, 0) situation, we consider the subsequent process once the number of 0-bits among the remaining n − 1 bit positions of the current individual, becomes less than n c for some constant c < 0.5, and let a denote the number of 0-bits of the current individual.If the first bit value changes from 1 to 0 before the number of 0-bits decreases to n c amount, we turn to the (X 0 1,1 , X 1 1,1 ) = (1, 0) situation.Otherwise, we will show that in the subsequent generations, with probability at least 1 − o(1), (1, 1) pattern will be maintained after the remaining bits reach the optimal 1 n−1 .Since the conditional probability is not less than the probability of simultaneous occurrence of two events, we can obtain that under the condition that the first bit value keeps 1, the conditional probability that there is at least one 0 among the remaining n − 1 bits in one generation is changed to 1 is at least (1 − 1/n) n−1 (a/n) ≥ a/(en) for (1 + 1) EA.Hence, under the condition that the first bit value keeps 1s, the expected time T for all n − 1 bit positions have bit value 1 is E T | the 1-st value keeps 1 ≤ cen ln n.With Chernoff bound [Doe11, Corollary 1.10 (d)], we know that Pr T ≥ n 1+c | the 1-st value keeps 1 ≤ 2 −n 1+c .
Noting that the probability that the offspring with the first bit value changing from 1 to 0 can enter into next generation is at most , we can obtain the probability that Event II happens within n 1+c generations is at least Finally, considering the above four possible situations, together with Pr[ ) as shown before, we end the proof.One key reason causing the difficulty for (1+1) EA and RLS is that there is only one individual in the population.As we see in the proof, once the algorithm finds (0, 1) optimal pattern in the first bit, the progress in OneMax component cannot pass on to the next generation, and once the current OneMax component finds the optimum before the first bit (0, 1) optimal pattern, the optimum first bit pattern cannot be obtained further.In EAs, population has many benefits for ensuring the performance [DJL17, CO19, Sud20].Similarly, we would like to know whether introducing population with not small size will improve the fault tolerance to overcome the first difficulty, and help to overcome the second difficulty since (1, 1 n ) individual has worse fitness so that it is easy to be eliminated in the selection.The details will be shown in Section 4.

(µ + 1) EA Can Find the Optimum
4.1 (µ + 1) EA Utilized for OneMax (0,1 n ) (µ + 1) EA is a commonly used benchmark function for evolutionary theory analysis, which maintains a parent population of size µ comparing with (1 + 1) EA that has population size 1.In the mutation operator, one parent is uniformly and randomly selected from the parent population, and the bit-wise mutation is employed on this parent individual and generates its offspring.Then selection operator will uniformly remove one individual with the worse fitness value from the union individual set of the population and the offspring.
Similar to (1 + 1) EA discussed in Section 3, the general (µ + 1) EA is utilized for non-timelinkage function, and some small modifications are required for solving time-linkage problems.For solving OneMax (0,1 n ) function in an offline mode, we just consider the representation that each individual in the population just encodes the current solution and store the historical solutions for fitness evaluation.Algorithm 3 shows how (µ + 1) EA solves the time-linkage function that replies on two consecutive time steps.With no confusion, we shall still call this algorithm (µ + 1) EA.Also note that we do not set the termination criterion in the algorithm statement, as we aim at theoretically analyzing the time to reach the optimum.

7:
Remove the pair with the lowest fitness in ( P g−1 , P g ) uniformly at random 8: P g+1 = P g , P g = P g−1 9: else 10: P g+1 = P g , P g = P g−1 11: end if 12: end for the better offspring generated in one step cannot be regarded as the decision of the next step for the individuals in the parent population other than its own parent.If we have enough budget before time step changes, similar to the discussion in Section 3.2, we will have the fitness less than 1 for any time step since X t = (1, . . ., 1) for each time step t.Also, it is not interesting for us.Hence, the following analysis only considers (µ + 1) EA (Algorithm 3) solving OneMax (0,1 n ) function in the offline mode.

Convergence Analysis of (µ + 1) EA on OneMax
In Section 3, we show the two cases happening before the optimum is reached that cause stagnation of (1 + 1) EA and RLS for the OneMax (0,1 n ) .One is that the (0, 1) first bit pattern is reached, and the other is that the current individual has the value one in all its bits with the previous first bit value as 1.The single individual in the population of (1 + 1) EA or RLS results in the poor tolerance to the incorrect trial of the algorithm.This subsection will show that the introduction of population can increase the tolerance to the incorrect trial, and thus overcome the stagnations.That is, we will show that (µ + 1) EA can find the optimum of OneMax (0,1 n ) with high probability.In order to give an intuitive feeling about the reason why the population can help for solving OneMax (0,1 n ) , we briefly and not-so-rigorously discuss it before establishing a rigorous analysis.
Corresponding to two stagnation cases for (1 + 1) EA or RLS, (µ + 1) EA can get stuck when all individuals have the first bit value as 1 no matter the previous first bit value 0 as the first case or the previous first bit value 1 as the second case.As discussed in Section 3, we know that the individual with previous first bit value as 1 has no fitness advantage against the one with previous first bit value as 0. Due to the selection operator, the one with previous first bit value 1 will be early replaced by the offspring with good fitness.As the process goes by, more detailedly in linear time of the population size in expectation, all individuals with the previous first bit value 1 will die out, and the offspring with its parent first bit value 1 cannot enter into the population (that is, the parent with the first bit value 1 is infertile).That is, the second case cannot take over the whole population to cause the stagnation.
As for the first case that (0, 1) pattern individuals takes over the population, we focus on the evolving process of the best (0, 0) pattern individual, which is fertile, similar to the runtime analysis of original (µ + 1) EA in [Wit06].The best (0, 0) pattern individuals can be incorrectly replaced only by (0, 1) pattern individual with better or the same fitness and only when all individuals with worse fitness than the best (0, 0) pattern individual are replaced.With sufficient large population size, like Ω(n) as n the problem size, with high probability, the better (0, 1) pattern individuals cannot take over the whole population and the (0, 1) pattern individuals with the same fitness as the best (0, 0) pattern individual cannot replace all best (0, 0) individuals when the population doesn't have any individual with worse fitness than the best (0, 0) individual.That is, the first case with high probability will not happen for (µ + 1) EA.In a word, the population in (µ + 1) EA increases the tolerance to the incorrect trial.Now we start our rigorous analysis.As we could infer from the above description, the difficulty of the theoretical analysis lies on the combining discussion of the inter-generation dependence (the time-linkage among two generations) and the inner-generation dependence (such as the selection operator).One way to handle this complicated stochastic dependence could be meanfield analysis, that is, mathematically analysis on a designed simplified algorithm that discards some dependences and together with an experimental verification on the similarity between the simplified algorithm and the original one.It has beed already introduced for the evolutionary computation theory [DZ20].However, the mean-field analysis is not totally mathematically rigorous.Hence, we don't utilize it here and analyze directly on the original algorithm.Maybe mean-filed analysis could help in more complicated algorithm and time-linkage problem, and we also hope our analysis could provide some other inspiration for the future theory work on the time-linkage problem.
For the clearness of the main proof, we put some calculations as lemmas in the following.
Lemma 3. Let a, n ∈ N, and a < n.Define function then h 1 (d) and h 2 (d) are monotonically decreasing.
Proof.Since h 1 > 0, and for any Similarly, since h 1 > 0, and for any Then, g(a), and hence g(a), are monotonically decreasing.Now we are ready to show our results that (µ + 1) EA can find the optimum of OneMax (0,1 n ) function with high probability.
Proof.For the uniformly and randomly generated generations P 0 and P 1 , we know that the expected number of individual pairs (X 0 i , X 1 i ), i = [1..µ] with first bit pattern (1, 0) or (1, 1) is µ 2 .Via the simple Chernoff inequality, we know that with probability at least 1 − exp(− µ 8 ), at most 3 4 µ individuals have the pattern (1, 0) or (1, 1).Under this condition, the expected number of pattern (0, 0) is at least 1 8 µ in the whole population.A simple Chernoff inequality also gives that under the condition that at most 3 4 µ individuals have the pattern (1, 0) or (1, 1), with probability at least 1 − exp(− µ 128 ), there are at least 1 16 µ individuals with pattern (0, 0) for the initial population.Hence, the probability that the initial population has at most 3 4 µ individuals with pattern (1, 0) or (1, 1), and at least 1 16 µ individuals with pattern (0, 0 ).Thus, in the following, we just consider this kind of initial population.
We first show that after Θ(µ) generations, the individuals with the first bit pattern (1, 0) or (1, 1) will be replaced and will not survive in any further generation.That is, Event II, one case that can cause stagnation for (1+1) EA, will not happen in the evolution process of (µ+1) EA on OneMax (0,1 n ) .Note that with probability at least 1 − µ/2 n , any individual in the first generation P 1 with previous first bit value 1 will not have value 1 n , that is, all (1, 1) pattern individuals have fitness at most −1.Also note that any individual with previous first bit value 0 has the fitness value at least 0. Thus, any offspring generated from (0, 0) pattern individual will surly enter into the generation and replace some individual with (1, 0) or (1, 1) pattern.Since there is at least 1 16 µ individuals with pattern (0, 0), we know that for each generation, with probability at least 1 16 , one individual with (1, 0) or (1, 1) pattern will be replaced.Hence, the expected time to replace all individuals with pattern (1, 0) or (1, 1) is at most 16 • 3 4 µ = 12µ since there are at most 3 4 µ individuals with pattern (1, 0) or (1, 1).Also it is not difficult to see any offspring with (1, 0) or (1, 1) pattern cannot be selected into the next generation for a population only with (0, 0) or (0, 1) pattern.
Later, only (0, 1) and (0, 0) first bit pattern can survive in the further evolution.For the population with only (0, 1) or (0, 0) first bit pattern, the individuals can be divided into three categories as in the following.
• Temporarily undefeated individuals.This category refers to the individuals with (0, 1) first bit pattern that have better fitness than the current best individuals with (0, 0) first bit pattern.This kind of individuals is not possible to be replaced until the best fitness value among individuals with (0, 0) first bit pattern increases.Besides, the offspring of this category cannot enter into the next generation due to its fitness at most 0.
• Current front individuals.This category refers to the individuals that have the same fitness as the best fitness among individuals with (0, 0) first bit pattern.The individual in this category can either have the (0, 0) first bit pattern, or the (0, 1) pattern.
• Interior individuals.This category contains other individuals that don't belong to the above two categories.
Similar to Event I for (1 + 1) EA, the only situation that can cause the stagnation is that (0, 1) first bit pattern takes over all population before the optimum is found.More detailedly, the stagnation happens when the current population only consists of front individuals and temporary undefeated individuals, and there is only one front individual with (0, 0) first bit pattern, and this (0, 0) pattern front individual generates an (0, 1) pattern offspring that successfully enters into the next generation.The following will show that this cannot happen with high probability before the optimum is reached by proving the following two facts.
• Fact I: With high probability, there is at most Θ(n) number of accumulative temporarily undefeated individuals before the optimum is found.
• Fact II: With high probability, it cannot happen that all (0, 0) pattern front individuals are replaced by (0, 1) pattern individuals when there is at most Θ(n) number of accumulative temporarily undefeated individuals.
Here we prove Fact I when there is at least two zeros in the individual with best fitness value among all (0, 0) pattern individuals.For the current population, let a denote the number of zeros in one individual with highest fitness among all (0, 0) individuals.Let m d denote the set of (0, 0) individuals that have a + d number of zeros.Obviously, m 0 is the set of the best (0, 0) individuals.Let A represent the event that the best (0, 0) fitness of the population increases in one generation, and B the event that one (0, 1) offspring with better fitness than the current best (0, 0) fitness is generated in one generation.Then we have and Firstly, we discuss what happens under the condition that the parent is not from m >a individuals when event B happens and a ≥ 2. Let B ′ represent the event that one of m >a individuals generates a (0, 1) offspring with better fitness than the current best (0, 0) fitness in one generation.Since where the last inequality uses a ≥ 2. Secondly, we consider the case when the parent is selected from m >a individuals, that it, event B ′ happens.With Lemma 3, we know We discuss in two cases considering a ≥ n c and a < n c for any given constant c ∈ (0, 1).A simple Chernoff inequality on the initial population gives that with probability at least 1 Hence, from (3) and (4), we obtain .
Then if we consider the subprocess that merely consists of event A and B, we have Pr ).Let X be the number of iterations that B happens in the subprocess before A occurs n times, then E[X] ≤ ( eµ n 2−c + 3e + 1)n.With the Chernoff bound for the sum of geometric variables (Theorem 1.10.32(a) in [Doe20]), it is easy to derive that for any positive constant δ, That is, along with µ ≤ n 2−c , we know that with probability at least 1 − exp − δ 2 2(1+δ) (n − 1) , there are at most (1 + δ) eµ n 2−c + 3e + 1 n ≤ (1 + δ) (4e + 1) n possible accumulative temporarily undefeated (0, 1) pattern individuals before A occurs n times, hence before the optimum is found.Now we prove Fact II, that is, with high probability, it cannot happen that all (0, 0) pattern front individuals are replaced by (0, 1) pattern individuals when there is at most Θ(n) number of accumulative temporarily undefeated individuals.Here, we also discuss when there is at least two zeros in the individual with best fitness value among all (0, 0) pattern individuals.Note that the (0, 0) pattern front individual cannot be replaced only when there is no interior individuals, that is, all individuals with worse fitness than the current best (0, 0) pattern individual are removed, otherwise, some interior individual instead of the (0, 0) front individual will be replaced by the new (0, 1) offspring.Since we already discussed the temporarily undefeated individuals in Fact I, in the following, we only consider the (0, 1) pattern individuals with the same fitness as any (0, 0) pattern front individuals to replace (0, 0) pattern front individuals.We establish the proof by showing that when all interior individuals are replaced, there are at least n 0.5 more (0, 0) front individuals with high probability, and with high probability, these (0, 0) front individuals cannot be all replaced before a better (0, 0) offspring is generated.
Consider any phase starting when a better (0, 0) pattern individual is generated and ending with an even better (0, 0) pattern individual generated.We still use a to represent the number of zeros in the best (0, 0) individual at the beginning of the phase, then the phase is ended once some offspring has less than a zeros.Recall that m d , d ≥ 0 denotes the set of (0, 0) individuals that have a + d number of zeros.We now analyze the change of |m 0 | until all other individuals have at most a zeros, if possible, during the phase.Let C represent the event that one (0, 0) pattern individual with a zeros is generated in one generation, and D the event that one (0, 1) pattern individual with a zeros is generated in one generation.Then we have Assume that the parent is not from m >a 2 individuals.Let D ′ represent the event that one of m >a 2 individuals generates a (0, 1) offspring with a zeros in one generation.Due to the definition of m d , when m >a 2 exist, we have a + a 2 ≤ n, and then a < n 0.5 .Since (5) Now we calculate Pr[D ′ ], the probability that one of m >a 2 individual generates a (0, 1) offspring with a zeros in one generation, as where the second inequality follows from Lemma 3 and the last inequality follows from Lemma 4 and a ≥ 2. With Pr Hence, from (5) and (7), we obtain Then if we consider the subprocess that merely consists of event C and D, we have Pr[C | C ∪ D] ≥ 1/( 8e n 4 + en 0.5 + 1).Recalling the definition of the phase we consider, it is not difficult to know that at the initial generation of this phase, there is only one (0, 0) front individual with a number of zeros, and not difficult to know that all (0, 1) front individuals, if exist, are temporarily undefeated individuals of the last phase.Note that with high probability, there are at most (1 + δ)(4e + 1)n accumulative temporarily undefeated individuals in the whole process.Since µ ≥ 2(1 + δ)(4e + 1)n, we know there are at least (4e + 1)n − 1 individuals that have more than a zeros.Hence, it requires at least (4e + 1)n − 1 number of steps of the subprocess to replace these individuals with more than a zeros.Let Y be the number of times that C happens in (4e + 1)n − 1 steps of the subprocess, then E[Y ] ≥ ((4e + 1)n − 1)/( 8e n 4 + en 0.5 + 1) ≥ 2n 0.5 .A simple Chernoff inequality [Doe11, corollary 1.10 (a)] gives that Pr[Y < n 0.5 ] ≤ exp(−n 0.5 /8).That is, with probability at least 1 − exp(−n 0.5 /8), |m 0 | will increase at least n 0.5 amount if current phase does not end before all individuals have at most a zeros.Now consider the event that all (0, 0) front individuals are replaced, that is, |m 0 | decreases to 0, when all individuals have at most a zeros.Since all individuals have at most a zeros, we have d>0 |m d | = 0.In this case, the probability for generate a (0, 0) better offspring, denoted as event F , is at least |m0| eµ a−1 n , and the probability to introduce a (0, 1) offspring with a zeros and one (0, 0) individual is replaced, denoted as event G, is at most (4e+1)n 2 µ .We pessimistically assume that |m 0 | ≤ (4e + 1)n and does not increase until the end of the phase.Then Pr Then the probability that G happens n 0.5 times but F does not happen is at most (1/2) n 0.5 .Since there are at most n − 1 phases before a = 1, we know the probability that (0, 1) individuals with the same best (0, 0) fitness can not take over all best (0, 0) individuals is at least The above analyses are based on a ≥ 2. Now we consider the remaining case a = 1.For a = 1, note that all the analysis for the phase starting when a better (0, 0) pattern individual generated and ending with an even better (0, 0) pattern individual generated except the last result in (6).We know at the beginning of the last phase, that is, a = 1, there are at least (4e + 1)n individuals with at least 1 zero.If the optimum is not found before all individuals with at least 2 zeros are replaced, we know with probability at least 1 − exp(−n 0.5 /8), there will be at least n 0.5 + 1 number of (0, 0) individuals with a = 1.In this case, the probability we found the optimum in one generation, denoted as F ′ , is |m0| enµ .Thus, for the last phase, the optimum can be reached with probability at least 1 − (1/2) n 0.5 .This theorem is proved considering all discussions above.
Comparing with (1 + 1) EA, since (1, 1 n ) individual, corresponding to Event II in (1 + 1) EA, has no fitness advantage against the one with previous first bit value as 0, it is easy to be replaced by the offspring with previous first bit value 0 in a population.Thus, this stagnation case cannot take over the whole population to cause the stagnation of (µ+1) EA.The possible stagnation case that (0, 1) pattern individuals take over the population, corresponding to Event I in (1 + 1) EA, will not happen with high probability because with sufficient large, Ω(n), population size as n the problem size, with high probability, the fertile (0, 0) pattern can be maintained until the optimum is reached, that is, the population in (µ + 1) EA increases the tolerance to the incorrect (0, 1) pattern trial.
4.3 Runtime Analysis of (µ + 1) EA on OneMax (0,1 n ) Theorem 5 only shows the probability that (µ + 1) EA can reach the optimum.One further question is about its runtime.Here, we give some comments on the runtime complexity.For the runtime of (µ + 1) EA on the original OneMax function, Witt [Wit06] shows the upper bound of the expected runtime is O(µn + n log n) based on the current best individuals' replicas and fitness increasing.Analogously, for (µ + 1) EA on OneMax (0,1 n ) function, we could consider the expected time when the number of the current (0, 0) pattern front individuals with a zeros reaches n/a, that is, |m 0 | ≥ n/a, and the expected time when a (0, 0) pattern offspring with less a zeros is generated when there are n/a current (0, 0) pattern front individuals with a zeros.From the proof of Theorem 5, with probability at least 1 − exp − δ 2 2(1+δ) (n − 1) , there are at most (1 + δ) eµ n 2−c + 3e + 1 n possible accumulative temporarily undefeated (0, 1) pattern individuals before the optimum is found.Hence, for µ ≥ 2(1 + δ)(4e + 1)n, in each generation before the optimum is reached, it always hold that at least half of individuals of the whole population, that is, at least (1 + δ)(4e + 1)n individuals are current front individuals and interior individuals.Hence, we could just discuss the population containing no temporarily undefeated individual and twice the upper bound of the expected time to reach the optimum as that for the true process.The (0, 1) pattern offspring with a zeros will not influence the evolving process of the current (0, 0) pattern front individuals we focus on until other interior individuals are all replaced.Recalling the proof in Theorem 5, we know that with probability at least 1 − exp(−n 0.5 /8), |m 0 | ≥ n 0.5 if no better (0, 0) offspring is generated before all individuals have at most a zeros.Hence, we just need to focus on the case when n/a ≥ n 0.5 , that is, a ≤ n 0.5 .
Consider the phase that starts once the current (0, 0) pattern front individual has a zeros and ends when better (0, 0) pattern offspring is generated.We discuss the expected period of this phase.When |m 0 | is less than n/a, we consider the event that one replica of a m 0 individual can enter into the next generation.When the population contains interior individual(s), the probability is |m 0 |/µ(1 − 1/n) n ≥ |m 0 |/(2eµ).When there is no interior individual in the population, we require |m 0 | to be less than 2n/a.Let µ ′ denote the number of current front individuals, and we know the probability of event H that one replica of an m 0 individual can enter into the next generation is . Note that the probability of event G that an m 0 individual generates a (0, 1) pattern offspring with the same fitness that successfully enters into the next generation is at most |m0| where the antepenultimate inequality uses µ ′ ≥ (1 + δ)(4e + 1)n and the penultimate inequality uses |m 0 | ≤ 2n/a.Hence, the expected number that H happens before G happens is at least n.Then a simply Chernoff inequality gives that with probability at least 1 − exp(−( a−2 a ) 2 n/2) ≥ 1 − exp(−(1 − 2 n 0.5 ) 2 n/2) ≥ 1 − exp(−n/18) for n ≥ 9 and a ≤ n 0.5 , H happens at least 2n/a times before G happens.
When |m 0 | goes above 2n/a, we consider the event that one (0, 0) pattern offspring with less a zeros is generated.Recalling the proof in Theorem 5, we know that Pr[F ], the probability of generating a better (0, 0) offspring, is great than Pr[G], the probability to generate a (0, 1) pattern offspring with the same fitness that successfully enters into the next generation.Hence, with probability at least 1 − (1/2) n 0.5 , F happens once before G happens n/a times, thus, before |m 0 | goes below n/a.
Recalling that the expected runtime of (µ+1) EA on OneMax is O(µn+n log n) [Wit06], which is O(n log n) for µ = O(log n).Since µ = Ω(n) is required for the convergence on OneMax (0,1 n ) in Section 4.2, Theorem 6 shows the expected runtime for OneMax (0,1 n ) is O(n 2 ) if we choose µ = Θ(n), which is the same complexity as for OneMax with µ = Θ(n).To this degree, we may say the cost for (µ + 1) EA solving the time-linkage OneMax (0,1 n ) majorly lies on o(1) convergence probability, not the asymptotic complexity.

Conclusion and Future Work
In recent decades, rigorous theoretical analyses on EAs has progressed significantly.However, despite that many real-world applications have the time-linkage property, that is, the objective function relies on more than one time-step solutions, the theoretical analysis on the fitness function with time-linkage property remains an open problem.
This paper took the first step into this open area.We designed the time-linkage problem OneMax (0,1 n ) , which considers an opposite preference of the first bit value of the previous time step into the basic OneMax function.Via this problem, we showed that EAs with a population can prevent some stagnation in some deceptive situations caused by the time-linkage property.More specifically, we proved that the simple RLS and (1 + 1) EA cannot reach the optimum of OneMax (0,1 n ) with 1 − o(1) probability but (µ + 1) EA can find the optimum with 1 − o(1) probability.
The time-linkage OneMax (0,1 n ) problem is simple.Only the last generation and the first bit value of the historical solutions matters for the fitness function.Our future work should consider more general time-linkage pseudo-boolean functions and problems with practical backgrounds.