Stochastic models of Jaya and semi-steady-state Jaya algorithms

We build stochastic models for analyzing Jaya and semi-steady-state Jaya algorithms. The analysis shows that for semi-steady-state Jaya (a) the maximum expected value of the number of worst-index updates per generation is a paltry 1.7 regardless of the population size; (b) regardless of the population size, the expectation of the number of best-index updates per generation decreases monotonically with generations; (c) exact upper bounds as well as asymptotics of the expected best-update counts can be obtained for specific distributions; the upper bound is 0.5 for normal and logistic distributions, $\ln 2$ for the uniform distribution, and $e^{-\gamma} \ln 2$ for the exponential distribution, where $\gamma$ is the Euler-Mascheroni constant; the asymptotic is $e^{-\gamma} \ln 2$ for logistic and exponential distributions and $\ln 2$ for the uniform distribution (the asymptotic cannot be obtained analytically for the normal distribution). The models lead to the derivation of computational complexities of Jaya and semi-steady-state Jaya. The theoretical analysis is supported with empirical results on a benchmark suite. The insights provided by our stochastic models should help design new, improved population-based search/optimization heuristics.


Introduction
The Jaya algorithm (technically, heuristic or meta-heuristic) [1,2] is one of the newest members of the evolutionary computation family. This algorithm and its variants have been highly successful in global optimization in continuous domains and have seen wide applicability in diverse areas, including engineering [3][4][5], manufacturing [6], energy [7], fuel cells [8], healthcare [9] and finance [10]. A recent survey can be found in [11]. Within the genetic and evolutionary computation family, Jaya is unique in its use of a minimum number of parameters, a fact that doubtless contributes to this algorithm's popularity among practitioners. The semi-steady-state Jaya (SJaya for short) [12] has been shown to outperform the standard Jaya on benchmark problems, with the improvement in performance attributed primarily to the new update strategies that SJaya employs for the best and worst members of the population.
Despite their explosive growth, no theoretical analysis of Jaya or its variants has, to our knowledge, been reported in the literature. Such analysis is fundamental to our understanding of why the method works the way it does and is a necessary prerequisite to designing better, newer methods for tackling hard optimization problems. This paper provides a rigorous theoretical underpinning of this powerful algorithm, modeling the algorithm as a stochastic process and deriving bounds and asymptotics for important performance metrics. The model allows us to investigate the costs of the update strategies, revealing several interesting facts about the working of Jaya and SJaya, leading to the derivation of the computational complexities of the algorithms.
The Jaya pseudocode [8] and SJaya pseudocode [12] are presented as Algorithms 1 and 2, respectively. 2 Framework for the analysis We assume, without loss of generality, an indexed representation (e.g., an array) ( Fig. 1) of the members of the population. The best and the worst members (individuals) are determined with respect to the fitness / utility / cost or some objective function. A single run of Jaya or SJaya comprises a number (G, say) of generations, and each generation consists of n steps or iterations, where n is the population size. A single iteration involves determining whether or not the member at index i (i = 1, 2, · · · , n) is to be replaced with a new individual. Clearly, it does not matter whether we traverse the population (array) in a top-to-bottom or bottom-to-top or any other fashion, as long as no index is left out or considered more than once. Suppose, for ease of discussion, we traverse the population in Fig. 1 sequentially from the top (index n) to the bottom (index 1).

Updating SJaya's worst-of-population index
Because the population changes with time, the index of the population's worst individual is time-dependent; that is, the worst individual's index may change after every replacement of the current (most recent) worst individual. Thus it is possible for the (current) worst individual to be encountered more than once during the top-to-bottom scan in a given generation of the population. The present analysis assumes that when the worst individual is encountered, it is replaced with a new (better or identical-fitness) individual with probability p. We also assume that the value of p does not change during a run.
Let us use the name findWorst() to indicate the function called to find the index of the worst individual in the population, and let worstIndex represent the index of the worst individual at any point in the course of a run. Because a simple linear scan of the population is enough to find the worst fitness, the worst-case complexity of findWorst() is Θ(n).
Let X be the (discrete) random variable representing the total number of calls, in an entire generation, to the function findWorst(). We are interested in finding the expected value of X, because the higher this expectation, the higher the cost of SJaya.
Suppose that at the beginning of a new generation, the worst individual in the entire population is at index k, i.e., worstIndex is k, with 1 ≤ k ≤ n. During the course of the generation, when this individual at index k is encountered, it will either stay unaltered or be replaced with a new individual. As mentioned earlier, the probability of replacement is assumed to be independent of k and equal to p; thus the individual at index k stays unaltered with probability 1 − p. If it stays unchanged, worstIndex stays unaltered. If, however, it undergoes replacement, we must find (by using a call to findWorst()) which individual in the population is the new worst (it is possible that the newly arrived individual at index k, while better than the individual just replaced, turns out to be the worst in the population at that point in time). The new worst individual, as identified by the above-mentioned call to findWorst(), must be • either in the already-traversed portion of the population array (at an index between k and n, inclusive, in Fig. 1); • or in the yet-to-be-traversed part of the population (at an index h, with 1 ≤ h ≤ k − 1).
In the first case above, no further call to findWorst() is needed for the rest of the generation, while in the second, the story will repeat itself with the new worst individual, necessitating a total of up to h (i.e., at least zero but at most h) further calls to findWorst() for the rest of the current generation. Let W be the discrete random variable representing the index of the worst individual in the population at a particular iteration of a particular generation during the execution of a run. Let us use t to represent the iteration number (not to be confused with the generation number for which we will use the notation g). Thus 1 ≤ t ≤ n and 1 ≤ g ≤ G.
For the n steps (iterations) in any generation of the SJaya, the corresponding variables are W (0) (the index of the worst individual at the start of a new generation), W (1) (the worst individual's index after the first iteration of the generation is over), and so on. Thus W (n) of a given generation is the same as W (0) of the immediately following generation. At the very beginning, under the assumption that the initial population is randomly generated, all slots of the array in Fig. 1 are equally likely to hold the worst one (P stands for probability): P (W (0) = k|n) = 1 n for k = 1, 2, · · · , n.
As the iterations (and generations) roll on, the distribution of the worst individual in the population may deviate from the uniform, depending on the policy used to update the population. That is, for t > 0, we do not in general have a strong reason to assume a uniform distribution for P (W (t) = k|n). Now, for the present analysis, we do not need P (W (t) = k|n) as much as we need the conditional probability P (W (t+1) = j|W (t) = k; n), which, in the absence of any further information, is assumed to be uniform (at all generations): for any (j, k) pair and any t.
In the course of a generation, when the worst individual is encountered, a new individual is produced and is compared against the worst individual (what happens to the worst individual is no different from what happens to every other individual in the population at the given generation). Now, if the new individual has a fitness that is better than or equal to that of the worst individual, the former replaces the latter, thereby necessitating the finding of which individual in the post-replacement state is the (new) worst in the population. This entails one call to worstFind(). Therefore, in the event of the replacement of the worst individual, at least one call must be made to findWorst(). Thus for an entire generation (recall the definitions of X and p), we have Given W (0) = k for a certain generation (recall that one generation equals n iterations), the variable X can assume one of the following values for that (entire) generation: 0, 1, 2, · · · , k. Thus equation 3 can be written more specifically as: Assuming W (0) = k, consider the top-to-down journey in Fig. 1. The index k may be thought of as indicating the point of demarcation, splitting the population into a top part of size n − k + 1 and a bottom part of size k − 1. Given W (0) = k, we can describe the result of a call to findWorst() as either an "up" move (when the index returned by findWorst() is ≥ k) or a "down" move (when the returned index is < k). Thus, given W (0) = k, the event X = 1 takes place when, starting at index k, we either move "up" once, never to move anywhere else, or move "down" once and do not move further: The event X = 2 occurs in one of the following two scenarios: (a) starting from slot k, the first move is a "down" move to slot i ∈ [1, k − 1], and the second one is a move "up" to any slot ∈ [i, n]; and (b) the first two moves are "down" each, followed by no further movement: Similarly, X is 3 when we have either (a) a "down" move from k to any location i ∈ [2, k − 1], followed by a second "down" move from i to any location j ∈ [1, i − 1], followed, finally, by an "up" move from j to any location ∈ [j, n]; or (b) three successive "down" moves followed by no further movement:

The general case
For the general case X = m, where m ∈ [1, n], we have the following theorem (the product notation Π evaluates to 1 when the upper bound is less than the lower bound): Proof: We present a proof by induction. The base cases for m = 1, 2 and 3 are already established via equations 6, 7, 8. The proof will be complete when, assuming the theorem is true for m = q ≥ 1, we show that it is true for m = q + 1.
Starting from location k, any (q + 1)-move sequence comprises a first "down" move to any location i ∈ [q, k − 1], followed by a sequence of q further moves, with the first move of the q-move sequence starting at location i. Thus Substituting for P (X = q|W (0) = i ≥ q; n) from the theorem (equation 9) into the above equation, we have Now, it can be shown (after some algebra) that Use of these two identities in equation 10 followed by some simplification yields Q.E.D. The expectation of X, given a particular W (0) and a particular n, is now obtained as Finally, the expectation of X, given an n, is The probability P (X = m|W (0) = k ≥ m; n) and hence the expectation E(X|n) are monotone increasing in p, with E(X|n) reaching its highest possible value when p = 1. Table 1 presents the analytically obtained maximum value of E(X|n) for different values of n. It is interesting to note from Table 1 that the maximum value is almost constant, regardless of the population size. Given the existing body of research on population sizing in evolutionary computation, we can say that the spread of population sizes in Table 1 is wide enough to include almost all cases of practical interest. Ignoring less-than-50 values of n as too small, we arrive at the rather remarkable conclusion that the expected number of worst-index updates per generation is 1.7 for almost any population size.

Empirical results
We obtain empirical estimates of the probability p and the expectation E(X|n) by aggregating (averaging) results from multiple, independent runs of SJaya. Table 3 presents the empirical average and the theoretical expectation for the functions in the benchmark test-suite in Table 2 (taken from [12]). Each row in Table 3 corresponds to Step Goldstein-Price The empirical p value is obtained as the average of 500 probabilities, each probability being calculated as a relative frequency from a single run, the data for a single run having been aggregated from the 20 generations comprising the run. In other words, two levels of aggregating (averaging) were implemented: aggregating over runs and aggregating over generations within a single run. While the runs are independent of one another, the generations that make up a single run are not absolutely independent, having been created on top of one another, as if in a chain or cascade. The empirical expectation of X is obtained as the average (per generation per run) number of times the worst individual in the population needs to be found out.
The empirical E is seen to be higher than the corresponding theoretical value in all the cases in Table 3. This is explained by the fact that, in practice, the distribution described in the left side of equation 2 deviates from the uniform; what we have in practice is where the indexing scheme is as in Figure 1. This non-uniform distribution is difficult to obtain analytically. It can of course be qualitatively argued that in a top-to-bottom processing of the population elements, the top part (comprising indices n, n − 1, · · · , k; 1 ≤ k ≤ n; see Figure 1) gets updated before the bottom part does, and since an update never results in a worse fitness, the probability of the worst individual being found in the bottom part is higher than in the top part at any point during the course of a generation. An empirical corroboration of this can be seen in the following matrix of P (W (next) = j|W (current) = k) values, which is obtained by averaging (in a relativefrequency sense) 5000 independent runs of SJaya on the Chung-Reynolds function ( Table 2) of 10 variables, where each run used 10 generations of a population of size 10: The above matrix, which, clearly, is a stochastic matrix (each row-sum is unity, ignoring floating-point errors), shows that for each row, the entries to the left of the diagonal element are smaller than those to the right of the diagonal element. The very first or initial (before any replacement has taken place) distribution of the worstIndex, obtained from these 5000 runs and presented in Table 4, supports the uniform distribution assumption used in equation 1.

Updating SJaya's best-of-population index
This section will show that the average generation-wise number of best-updates is typically small and thus does not add significantly to the computational cost of SJaya. We establish the smallness of the best-update count both empirically and theoretically.
A knowledge of the expected number of updates, in a generation, of the best-ofpopulation index is required for an analysis of SJaya. Of course, to derive this expectation, we need the underlying (discrete) probability distribution. To compute the probability that a new individual, created in line 6 of Algorithm 1 or Algorithm 2, will be better than an existing individual, we need, among other pieces of information, a knowledge of the (typically continuous) distribution of the fitness landscape. Now, the fitness distribution is impossible to know (in advance), except in trivial cases. A generic analysis, however, is possible if we are willing to make an assumption about the nature of this distribution. In the absence of any further information, we will proceed with the assumption that this distribution is normal (Gaussian). Now, the two parameters -mean and variance -of the normal distribution will affect the analysis quantitatively, not qualitatively. Therefore, for ease of calculations, we will use the normal distribution with mean µ and variance σ 2 .
An update of the best index is needed whenever the newly arrived individual has a fitness better than that of the current population-best. During the course of a run, the expected value of the population-best fitness at the beginning of a fresh generation can be modeled as the expectation of the best (either maximum or minimum, depending on the application) of n i.i.d. samples drawn from a given fitness distribution, with n representing the population size. (The population size is assumed not to change from generation to generation, of course.) We assume maximization without loss of generality.
Let us use f for probability density function (pdf) and F for cumulative distribution function (cdf). The maximum of n i.i.d. samples x 1 , · · · , x n of a continuous random variable X is another (continuous) random variable; call it X max . Then the expected value of X max is given by is the pdf of X max , and is the cdf of X max , such that is the cdf of X, with f X (x) representing its pdf.
To derive the expected number of updates, over a complete generation, of the population-best member, we begin by defining a discrete (binary) random variable Y i,g;n,F representing whether or not an update is made at iteration i ∈ {1, · · · , n} of generation g ∈ {1, · · · , G}: with the initial generation (g = 0) assumed to have filled the population for the very first time. In other words, Y i,g;n,F is the indicator variable 1 x>E(Xmax | gn+i−1,F X ) . The expectation of Y i,g;n,F is given by If Y g;n,F denotes a random variable representing the total number of updates in a given generation g, we have and the expectation of Y g;n,F is then obtained as where linearity of expectation has been used (the linearity is applicable regardless of whether or not the Y i,g;n,F 's are independent). From the definition of X max it follows that for n 2 > n 1 , which implies P (x > E(X max | n 2 , F X )) < P (x > E(X max | n 1 , F X )).

Thus we have
or equivalently, Again or equivalently, E(Y i 2 ,g;n,F ) < E(Y i 1 ,g;n,F ) for i 2 > i 1 .
Therefore E(X max | (g + 1)n, F X ) > E(X max | gn + n − 1, F X ) which shows that the E(Y i,g;n,F ) value corresponding to the last iteration of any generation is strictly greater than that corresponding to the first iteration of the immediately following generation. Inequalities 26-30 lead to E(Y g 2 ;n,F ) < E(Y g 1 ;n,F ) for g 2 > g 1 ≥ 1.
Note that the above inequality holds for any F X (or equivalently, for any f X ) and for any n. Thus we have proved the following theorem: Theorem 2: For any problem, in any run, the expected generation-wise best-update count decreases monotonically with generations, regardless of the population size.
stand for the average (over all the generations in a run) of the expected generation-wise best-update counts. Then Theorem 2 implies thatȲ can be made arbitrarily small by making the total number of generations G arbitrarily large, a fact that allows us to argue that the number of best-updates per generation (or per run) should not be a concern, so far as computational costs are considered. While that argument is theoretically sound (Ȳ does indeed → 0 as G → ∞) and empirical results (Section 4.2) show that the E(Y 1;n,F ) is small even for large n and that the update count drops fast with generations, the caveat is that because the true density f X (or equivalently, the true cdf F X ) always remains unknown and because for an arbitrary f X , it is difficult, if not impossible, to obtain a tight upper bound on E(Y 1;n,F ), a proof that the average generation-wise best-update count is guaranteed, regardless of the problem, to drop to a specified (small) value after the consumption of a specified (modest) number of generations remains elusive. Below we consider four particular distributions for which we establish upper bounds on E(Y 1;n,F ); these four cases are potential candidates for approximations to the true (unknown) distributions.

Special cases 4.1.1 The uniform random distribution
For the Unif(a, b) distribution, the density is given by and the corresponding cdf is which, when used in equation 16, gives from which we get This expression allows us to obtain E(Y 1;n,Unif ) from equation 25 as where H n is the n-th harmonic number. It is not difficult to prove from either of the above two equations that E(Y 1;n,Unif ) is monotone increasing in n. Luckily, an upper bound on E(Y 1;n,Unif ) can be obtained using the fact that Thus, for any n, no matter how large, and any a and b The minimum value of the expectation is 1/2 and corresponds to n = 1 (recall that the mean of the Unif(a.b) distribution is (a + b)/2 and that the area under the pdf box to the right of (a + b)/2 is 1/2). Theoretical expectations of Y 1 values corresponding to different population sizes are presented in Table 5 where the corresponding values for three other distributions are also shown.

The exponential distribution
For the exponential distribution, the pdf and cdf are given by and which lead to We then have from which we obtain (by equation 25) Now, E(Y 1;n,Exp ) is monotone non-decreasing in n. Thus the smallest value of this expectation occurs at n = 1, and that value, from equation 45, is 1/e or 0.3679, a value that is corroborated by the fact that the mean of the exponential distribution is 1/λ and that the area under the pdf to the right of the point 1/λ is e −λ×(1/λ) or 1/e. The case corresponding to an arbitrarily large n can be studied by using the fact that  Table 5 shows how the theoretical E(Y 1;n,Exp ) varies with n, reaching the limit as n approaches infinity.

The normal distribution
The normal distribution N (µ, σ), with mean µ and standard deviation σ, unfortunately, admits of no closed-form expression for E(X max | n, F X ) when and the integration in equation 16 must be evaluated numerically. We find E(Y 1;n,Norm ) numerically, from equation 25 (using numerical routines from Python's scipy [13,14] and also from Mathematica [15]). The numerically obtained Y 1 values corresponding to different population sizes are presented in Table 5. Note that Y 1 is monotone nonincreasing with n for the Gaussian. Thus the maximum possible value of E(Y 1;n,Norm ) is obtained at the smallest possible value of n, namely 1, and the corresponding E(X max ) is clearly the mean, µ, of the distribution, which, because of symmetry (the mean equals the median), causes P (x > E(X max | 1, Norm X )) to be 0.5, leading to E(Y 1;1,Norm ) = 0.5. Thus, for any n, no matter how large, and any µ and σ, E(Y 1;n,Norm ) is upper-bounded by E(Y 1;n,Norm ) ≤ 0.5.
The limit of the expectation, as n → ∞, cannot be obtained analytically.

The logistic distribution
The logistic distribution offers some similarity (e.g., unimodality, symmetry) to the normal. That, coupled with the fact that it is amenable to analytical treatment, affords an alternative to the normal distribution for modeling purposes. For simplicity, let us consider location and scale parameters of 0 and 1, respectively. This does not cause any loss of generality, because any logistic variable X with location a and scale s > 0 can be transformed to another logistic variable Z: The cdf and pdf of the logistic distribution are given by and Thus with H 0 taken to be zero (recall that n ≥ 1). Then

Empirical results
Empirical values of the average counts of best-updates are obtained by aggregating independent SJaya runs for each of the test functions. For a population size of n, the empirical expectation (average) at a given generation g is produced from an ensemble of r runs as: where N k (g) is the number (an integral count ≥ 0) of updates of the best-index at generation g in run k. The corresponding average (over all generations) is obtained as In the above two equations, g ≥ 1 (g = 0 represents the initial population). Table 6 shows, for different population sizes, E empir (Y n ) values as well as how E empir (Y g;n ) changes with generations (r = 500 and G = 20 in this table). The runs used in this table are the same as the ones used in Table 3.

Computational costs
While the Jaya algorithm finds the best-of-population member and the worst-of-population member exactly once per generation, SJaya does this on a continuous, as-needed basis. The logic for finding the best (or worst) of a given number of elements can be implemented as a simple sequential scan of the elements, consisting of two basic operations for each element: a comparison followed, conditionally, by an assignment. We now find the costs of these two types of operations.

Comparison and assignment operations for best-index update in SJaya
The total number of comparison operations needed for updating the best index in an entire generation of SJaya (call this number #comp) is equal to the number of times line 9 in Algorithm 2 is executed (i.e., the condition in line 9 is tested) per generation (this number is the same as the number of times the condition in line 7 evaluates to TRUE in a generation). We need to find the expected value of #comp. We can model the new individual being at least as good as the current individual (in line 7) by the event X 2 ≥ X 1 , where X 1 and X 2 are two independent random samples drawn from the same (unknown) distribution. (As mentioned earlier, this distribution is never truly known, and we have to make do with estimates and approximations.) Defining a random variable we have Thus and the expectation of the total number of comparison operations in a generation of SJaya is given by where the last step follows from the fact that the events at the n slots of the population are governed by the same underlying distribution. Thus which, by the law of total probability, gives If the distribution of X is known, we can analytically obtain E(#comp). For instance, this expectation is n/2 for exponential, uniform and logistic distributions but cannot be obtained in an explicit closed form for the normal distribution.
Next, the total number of assignment operations (call it #assign) needed for updating the best index in an entire generation of SJaya is equal to the number of times line 10 is executed per generation in Algorithm 2. The expectation of this count, E(#assign), was already derived in Section 4; E(#assign) can be taken to be either the generation-specific E(Y g;n,F ) or the averageȲ . This expectation is obviously a function of the population size n, and Section 4 obtained the maximum value of this expectation (corresponding to either n = 1 or n → ∞, depending on the nature of the distribution) for specific distributions, as follows: (ln 2)/e γ for exponential distribution ln 2 for uniform distribution 0.5 for normal distribution 0.5 for logistic distribution.
As mentioned earlier, it is difficult to obtain a closed-form analytical expression of this expectation for arbitrary distributions; however, by Theorem 2, this expectation, when averaged over a number of generations, goes down as the number of generations increases, regardless of the underlying distribution.

Comparison and assignment operations for finding the best/worst of n elements
The naïve approach to sequentially scanning an array for finding the best (or worst) element entails exactly n (or n − 1, depending on the implementation) comparisons: The number of assignments, however, is not deterministic. Assuming the array index runs from 1 to n, the number of assignments can go from a minimum of 1 to a maximum of k (or from 1 to n − k + 1, depending on the implementation), inclusive, when the best (or worst) element is located at index k. The average-case analysis can be performed by noting that when the numbers are uniformly distributed in the array, the j-th element is greater (smaller) than the preceding j − 1 elements with probability 1/j, independently for all j = 1, · · · , n. Thus  (line 7) costs C c , and there are n such checks in a generation. The replacement at line 8 takes place E(#comp) times in a whole generation (recall that the condition at line 7 is true these many times on average in a generation). Again, the condition in line 9 is tested E(#comp) times in a whole generation. And, as already shown in Sec. 5.1, the update in line 10 occurs E(#assign) times per generation. The condition in line 12 is tested E(#comp) times in a generation, and finding the worst individual in line 13 is needed a maximum of 1.7 times per generation. The total cost of a single run is thus nφ(d) + 2(nC c + H n C a ) + G C p d + n(C op d + φ(d)) + nC c + E(#comp) C a + E(#comp) C c + E(#assign) C a + E(#comp) C c + 1.7(nC c + H n C a ) .

Complexity of Jaya
The complexity of Jaya can now be derived easily. Most of the calculations carry over from those of SJaya. Noting that the replacement of the existing individual with the new one (in line 8 of Algorithm 1) takes place E(#comp) times in an entire generation, the complexity is given by nφ(d) + G 2(nC c + H n C a ) + C p d + n(C op d + φ(d)) + nC c + E(#comp) C a .

Cost difference between SJaya and Jaya
Using the upper bound of #assign, namely #comp, and assuming E(#comp) = n/2, we obtain the following estimate of an upper bound of the additional cost incurred by SJaya over Jaya per generation: additional cost ≤ n 2 (2C a + C c ) − 0.3(nC c + H n C a ) (77) ≈ (n − 0.3 ln n − 0.17316)C a + 0.2nC c for large n.
This additional cost is not significant compared to the total cost of evaluating the fitnesses of the n population members in a generation. If, in light of the analysis in Sec. 4, a more realistic value of E(#assign) n is assumed, the additional cost becomes even lower.

Conclusions
A theoretical analysis of two stochastic heuristics -Jaya and its recent improvement SJaya -has been presented in this paper. A remarkable fact revealed by the analysis is that the maximum expected number of worst-index updates per generation for SJaya is only about 1.7 for almost any population size of practical interest. Furthermore, regardless of the population size, the expectation of the number of best-index updates per generation decreases monotonically with generations. We derived exact upper bounds of the expected number of best-index updates when the underlying distribution is exponential, logistic, normal or uniform. Asymptotics of expected best-update counts were obtained for exponential, logistic and uniform distributions. Limitations of the analytical approach and the need to resort on occasion to numerical techniques have been pointed out. The model allowed us to obtain computational complexities of the algorithms, which showed that the performance improvement afforded by SJaya over Jaya incurs only a modest additional cost. Empirical results on benchmark test problems were obtained and found to corroborate the theoretical findings. To our knowledge, this is the first theoretical analysis of this powerful and popular family of heuristics. The insights provided by our models should help design new, improved population-based search/optimization heuristics in machine learning / artificial intelligence. The analytical approach developed here has the potential to be extended to the analysis of other types of evolutionary algorithms.