Adaptive Differential Evolution Based on Successful Experience Information

As a powerful optimization algorithm for solving nonlinear, complex and tough global optimization problems, differential evolution (DE) has been widely applied in various science and engineering fields. In this article, considering that the evolution direction of each individual is not fully exploited to guide the search process in most DE algorithms, a new DE variant (named ADEwSE), which incorporates the successful experience of evolved individuals into classic “current-to-pbest/1” mutation strategy to reduce the randomness of search direction, is proposed. Moreover, crossover matrix sorting scheme based on real crossover rate, opposition learning of crossover rate and adaptive adjustment of top p% values are combined with the new mutation strategy to improve the global search ability. In addition, to improve the searching ability of ADEwSE further, an ADEwSE variant by introducing the linear reduction of population size is proposed. In order to verify and analyze the performance of ADEwSE, numerical experiments on a set of 29 test problems from CEC2017 benchmark for 30, 50 and 100 dimensions are executed. And the experimental results are compared with that of 21 state-of-art DE-based algorithms. Comparative analysis indicates that the ADEwSE and its improved version are competitive with these state-of-art DE variants in terms of solution quality obtained.


I. INTRODUCTION
Optimization problems exist in many real-world applications, and the key point of a proper optimizer is to obtain a satisfactory solution in a reasonable time [1]. Today, many innovative algorithms, such as Grammatical Evolution (GE) [2], Particle Swarm Optimization (PSO) [3], [4], Differential Search algorithm (DS), Artificial Bee Colony (ABC) [5], and Gaining-sharing Knowledge based algorithm (GSK) [6] etc., have been developed and successfully applied for solving non-linear, high dimensional and complex optimization problems [7]. Differential evolution (DE) has been illustrated as a simple but efficient evolution algorithm proposed by Storn and Price [8] in 1995. Similar to all other evolution algorithms (EAs), DE employs a population-based stochastic The associate editor coordinating the review of this manuscript and approving it for publication was Nuno Garcia . search method to obtain acceptable results. The advantages of DE are ease of use, speediness, simple structure, and robustness. Therefore, during the recent decades, it has drawn more and more attention in many scientific and engineering fields, such as electrical power system [9], [10], optima power flow [11], [12], neural network training [13], data mining [14], geophysical inversion [15] and image processing [16], [17], etc. Generally, DE uses mutation and recombination as search mechanisms to explore new components of potential solutions. It provides a new candidate vector by combining the components of a parent chosen from the same population with the components of the mutation vector. A candidate vector is chosen for the next generation only if it has a better fitness value than its parent vector.
Despite the DE algorithm benefits from several aforementioned attractive features, it is observed that the performance of DE is highly dependent on the configuration of mutation VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ strategies and control parameters, such as crossover rate (CR), and scaling factor (F). Actually, it's a perplexing question to choose the most appropriate mutation strategy and parameter settings since some mutation strategies are effective for global search, and some others are helpful for local exploitation, and some parameter settings can improve the convergence speed [18]. In order to obtain better results, the trial-and-error method is usually used to determine the best control parameters and mutation strategy for different problems. Actually, even for the same problem, the required parameters and strategies may vary at different stages of evolution searching. So, a considerable number of DE variants have been proposed to tune control parameters or ensemble of mutation strategies by using adaptive or self-adaptive mechanisms, including jDE [19], JADE [20], SHADE [21], CoBiDE [22], MPEDE [18], jSO [23] and so on. Note that JADE, SHADE, jSO share the same idea of maintaining a balance between exploration and exploitation abilities, and achieve it in various ways. Besides, in most of DE, the mutation direction is set by a random manner. To control the search direction more heuristically, Jia et al. [24] presented a novel and simple technique to control the search direction of jDE (jdDE). In jdDE, the search direction using the information between the trial vector and its target vector can improve the convergence rate and obtain better solution quality. Li et al. [25] formulated a new evolutionary algorithm framework by introducing the evolution path (EP) into DE. This framework offers advantages of both a distributed model (DM) and a centralized model (CM), and hence enhances the performance of DE. Lately, Zhang and Yuen [26] proposed a directional mutation (DM) operator. The proposed method can result in a larger number of generations having fitness improvement, and saving computational time than classic DE. With information sharing concept, Ching-Hung et al. [27] added the momentum of target individual with ''current-to-best/1'' mutation strategy, and adjusted scaling factor with fuzzy logic system. However, it is known to all that evolution direction information, such as evolution path and momentum, is rarely used to guide the search process. The reason is that to incorporate these directions into DE will increase the risk of premature convergence and is incapable of improving the exploration ability.
The motivation of this research is described as following two ideas. Firstly, the main aim is to combine the evolution direction information of individuals with descent direction of population. Considering that the ''current-to-pbest/1'' mutation strategy which is used widely already includes the direction information of top p% individuals, an interesting idea is to design a novel mutation strategy by combing the evolution direction information and guided information of top p% individuals. Secondly, as discussed by Zhang and Yuen [26], to use the evolution directions more than once will cause the algorithm to be trapped in local optima due to their greediness. Therefore, it's necessary to introduce some new parameters and modifications of crossover to improve the exploration and exploitation abilities.
In this study, a successful experience (SE) pool which includes the evolution direction of each individual is established based on the selection operator. Unlike the directional mutation algorithm [26] which maintains a difference vector pool by using the difference vector between a trial solution and its parent, the update of successful experience is related to the crossover process. It implies that the successful experience stores the evolution direction of all variables of all individuals. Obviously, a successful experience vector in SE pool may be seen as a feasible search direction which means that further fitness improvement by searching along this direction is possible. In other words, the successful experience provides past information related to the evolution search process, which can improve the ability to search the optimal solution. To utilize the successful experience information, an adaptive differential evolution based on successful experience (named ADEwSE for short) is proposed. However, merging the SE vectors into DE may lead to trap into a local optimum easily, and decrease the population diversity. Thus, for the purpose of increasing the exploration ability and keeping the diversity of the population, sorting the crossover matrix inspired by sorting the crossover rate mechanism in [28] and opposition learning of CR are proposed. In addition, as we know, DE suffers stagnation and/or premature convergence, which deteriorates significantly the performance of DE [29]. Some practical experiences have suggested that no better solution can be created by mutation strategy and crossover operation when stagnation occurs [30]. Thus, in order to relieve the problem of stagnation, a disturbance-based crossover (DX) is added, which is helpful for increasing the diversity of potential trial solutions, and improve the local exploitation ability.
In summary, the main contributions of this article are as follows: (1) A novel mutation strategy with successful experience information is proposed. In the proposed mutation operator, the successful experience which includes the good searching direction is incorporated into classical ''current-topbest/1'' mutation strategy. (2) To strengthen the exploration of DE, a crossover matrix sorting scheme is introduced. In the proposed method, the trial vectors generated by superior individuals tend to inherit more components from their parents. (3) An opposition learning of crossover rate strategy is presented to balance the exploration and exploitation abilities dynamically. (4) A disturbance crossover (DX) is presented. The DX is executed only when an individual is stagnant. (5) An adaptive p strategy is proposed, instead of a static p value in the JADE. (6) The performance of ADEwSE is improved further by introducing linear reduction of population size strategy.
This article is organized as follows. Section 2 introduces the classical DE algorithm. In Section 3, some recent related works are surveyed. Then the proposed algorithm is presented in detail in Section 4. In Section 5, the experimental results and comparisons with other techniques are reported and analyzed. Finally, the conclusion is given in Section 6.

II. DIFFERENTIAL EVOLUTION
DE is a population-based heuristic search method for solving the numerical optimization problem. It has four basic processes: initialization, mutation, crossover and selection. Without loss of generality, in this work, the minimization numerical optimization problem is considered to be f (x) , x ∈ R D , and D is the number of dimensions in the search space. DE starts with a random population of real-coded vectors representing the solution of a given problem. Every individual in population can be denoted as , NP, NP is the population size and G is the generation times. These individuals are evolved by executing mutation, crossover and selection operation iteratively. In the following subsection, a brief summary of the original DE algorithm is provided.

A. INITIALIZATION
Like other EAs, initial population vectors in DE are randomly generated in the search space constrained by the prescribed minimum and maximum bounds. Thus, the jth decision parameter of the ith individual is initialized via the following formula: where rand (0, 1) denotes a uniformly distributed real number within the range [0, 1], and L j and U j are the lower and upper bounds of the jth decision parameter, respectively. Therefore, in the initialization step, NP individuals are generated in the search region defined by a lower L and upper U bound, i.e. L = (L 1 , L 2 , · · · , L D ) and U = (U 1 , U 2 , · · · , U D ).

B. MUTATION
After the initialization, the mutation strategy is employed to generate a mutant vector v G i for each target vector x G i in the current population. Essentially, the formula for mutation can be regarded as an operation that combines one or multiple differential vectors with a population vector. The most widely used mutation strategies are followings [18]: The parameters r1, r2, r3, r4, r5 are mutually different random integers within the range of [1,NP], which are also different from the index i. The coefficient F is a positive controlling parameter for scaling the difference vector. x G best is the best individual vector which has the best fitness value at generation G. More details can be found in [31].

C. CROSSOVER
Following the mutation step, crossover operation is introduced to obtain trial vector u G i by replacing certain variables of the target vector x G i with corresponding mutant vector v G i . In DE, there are two types of crossover scheme: exponential and binominal. We here elaborate the binominal crossover. In the binominal crossover, at least one component of the trial vector is inherited from the mutant vector. Crossover operation equation can be defined as follows: where CR ∈ (0, 1) is the crossover rate that controls which or how many components are inherited from the mutant vector. jrand is a randomly selected integer in the range [1, D]. Let b G i be a binary string generated for each target vector x G i by the following equations [32]: Then, the binominal crossover in Eq. 8 can be modified as Based on Eq. 9 and Eq. 10, it's no doubt that the binary string is stochastically related to CR. According to Eq. 10, the components of trial vector are directly related to its binary string.

D. SELECTION
In DE, a greedy mechanism is employed to select a better one between the trial vector u G i and the target vector x G i according to their fitness values. The selection scheme is given by: In Algorithm 1, a detailed description of classic DE which utilizes ''DE/rand/1'' mutation operator is given.

III. RELATED WORKS
In fact, the performance of the conventional DE algorithm mainly depends on the chosen mutation, crossover strategies and the associated control parameters. Many researchers have made a lot of contributions to the improvement of DE performance by proposing new technologies [33]. A brief overview of these techniques is discussed in the following subsection.  For j = 1 to D Do 10: If rand (0, 1) ≤ CR or j = jrand Then 11: The three main parameters of DE are the scaling factor F, the crossover rate CR and the population size NP, which need to be set during the initialization phase. To obtain an optimal result, it's a vital step to find out the proper setting of these control parameters. Storn and Price [34] suggested that the population size between 5D and 10D and the initial value of scaling factor as 0.5 are sufficient for obtaining a satisfactory solution. Later, Gämperle et al. [35] recommended that choosing the population size within the range of [3D, 8D], the scaling factor as 0.6 and crossover rate in [0.3, 0.9] is a proper choice. However, Ronkkonen et al. [36] concluded that taking F as 0.9 initially is suitable for making a compromise between convergence speed and convergence rate. For the crossover rate, they suggested that CR with a value between [0, 0.2] is proper for separable objective functions, and a value of CR between [0.9, 1] is proper when the objective functions are not separable and multimodal. In addition, Zielinski et al. [37] suggested that setting F and CR ≥ 0.6 in many cases is helpful for obtaining good results. From the above, it can be observed that various suggestions for parameter settings have been proposed according to the manual parameter tuning method. Anyway, these empirical parameter settings are quite different from each other, and it implies that choosing the most proper parameter settings is usually difficult when the properties of optimization are unknown, and sufficient experimental justifications are lacked [33]. Thus, some adaptive or self-adaptive techniques are designed to produce the control parameters to avoid manually tuning the control parameters.
Liu and Lampinen proposed fuzzy adaptive differential evolution with a fuzzy logic controller in [38] which F and CR are adjusted based on the relative fitness value and individuals of successive generations. Likewise, Brest et al. [23] proposed an efficient self-adaptation scheme (jDE) by encoding F and CR into each individual, and adapted them by means of evolution. Qin and Suganthan [39] introduced a memory-based self-adaptive differential evolution (SaDE) algorithm. In SaDE, the authors allow F to take different values in the range (0, 2] with a normal distribution (0.5, 0.3) for each individual. Meanwhile, the new values of CR are produced based on the successful values of the crossover rate stored in a memory bank. Jia et al. [24] presented an adaptive scheme to update the control parameters to proper values in JADE. For each target vector, the scaling factor F and crossover rate CR are drawn from a Cauchy distribution C (µ F , 0.1) and a normal distribution N (µ CR , 0.1) respectively where µ F and µ CR are the respective mean values of distribution. Instead of sampling F and CR values from the most recent successful mutation information in JADE. Li et al. [25] used historical memory archives M F and M CR to generate new F, CR pairs by directly sampling the parameter space. Li and Yin [15] proposed a bimodal distribution parameter setting scheme to control parameters of mutation and crossover operators, and balance the exploration and exploitation ability of DE. Yu et al. [40] introduced a two-level parameter adaptation scheme for DE: adjusting dynamically the population-level scale factor F p and crossover rate CR p according to the exploration and exploitation statues, and adapting the individual-level control parameters F i and CR i based on the fitness value of the individual as well as the distance to the global best individual in the population. Mohamed and Mohamed [33] presented a novel and effective adaptation scheme, in which crossover rate is adaptively selected from pre-determined two sets based on their experiences of generating promising solutions in the process of evolution, and scale factor is independently generated according to a uniform distribution in (0.1, 1) at each generation. Zhou et al. [28] modified JADE by sorting crossover rate strategy. The individuals with small fitness values in the population are assigned with small CR values. Cui et al. [41] proposed a new self-adaptive adjustment for differential evolution, called ADEDE. In the ADEDE, a parameter population is established for the solution population and updated from generation to generation under the basic principle that a good parameter individual is more likely to survive at high risk, while the bad parameter individuals should have a high probability to learn from the good parameter individuals. Sun et al. [42] adjusted the control parameters by applying the fitness information of each individual and a dynamic fluctuation rule in IDDE algorithm and obtained a better balance between the exploration and exploitation ability.
Apart from F and CR, adaptive adjustment of population size NP has also received much attention in recent years since the population size significantly impacts the converg-ence rate of evolution algorithm. The modified jDE algorithm, called dynNP-DE [43], is initialized with a large population size (10D) and after each 25% of allowed function calls NP is reduced by half, to 1.25D in the last quarter of the run. Tirronen and Neri [44] introduced a population adaptation mechanism based on fitness diversity which is calculated by some proposed measures. Zhu et al. [45] in their ATPS-DE algorithm introduced a new NP adaptation strategy in which a status monitor counts the number of the recent consecutive generations with or without improvement of the best solution. Piotrowski [46] further improved SHADE with a linear population size reduction, which continuously reduces the population size according to a linear function. A detailed overview of population size adaptation can be found in [47].

B. MODIFICATIONS OF THE MUTATION STRATEGY, CROSSOVER SCHEME AND SELECTION OPERATOR
To improve the performance of the DE algorithm, researchers have made some attempts to design new kinds of mutation strategy. Das et al. [48] proposed a physical neighborbased mutation operator, which contains two parts: local neighborhood-based mutation and global neighborhoodbased mutation, to balance the exploitation and exploration abilities of DE. Jia et al. [24] improved the optimization performance of DE by implementing a new mutation strategy ''current-to-pbest/1'' with the optional external archive. Due to the external archive and the manner of updating control parameters, JADE showed significant performance improvement for relatively high-dimensional problems. Mohamed [49] presented a new triangular mutation rule based on the convex combination vector of the triangular to enhance the local search tendency, to improve the global exploration ability and to accelerate the DE algorithm. Mohamed and Mohamed [33] introduced an adaptive guided mutation rule where the difference vector is formed by two randomly chosen vectors of the top and bottom 100p% individuals in the current population to maintain the balance between the exploration and exploitation abilities during the process of evolution. Islam et al. [50] proposed a modified DE with p-best crossover, called MDE_pBX. The novel mutation strategy selects the best individual from a dynamic group of 100 × q% randomly in the current population. Inspired by collective intelligence, Zheng et al. [51] proposed a new mutation strategy, referred to as ''current-toci − mbest/1'', to achieve better search capability. A collective vector x G ci_mbest i is a linear combination of the top-ranking m (m is a random integer for x G i and ∈ [1, i] ) population vectors in current population with fitness values better than or equal to x G i . He and Zhou [52] proposed a new mutation scheme named the ''current-to-better/1'' by embedding CMA-ES into the ''current-to-pbest/1'' strategy. This scheme strengthens both exploration and exploitation of DE by guiding the search with a Gaussian distribution. Peng et al. [53] proposed a random neighborhood-based mutation strategy ''neighbor/1'', and adopted the best one in the neighbors as the base vector. According to the experimental results, neighborhood information is capable of making a proper balance between global and local search in the searching process of DE.
In addition to the modification of mutation and the adjustment of control parameter settings, enhancements in crossover has also been investigated. To enhance the search-ing capability of DE, the orthogonal crossover was introduced by Wang et al. [54], and adapted with a quantization technique proposed by Leung and Wang [55] who used it with genetic algorithm (GA). Besides, considering that binomial crossover widely used in DE is not rotationally invariant, Guo and Yang [56] suggested that a rotationally invariant crossover is more likely to generate better trial vector. In addition, Wang et al. [22] presented a covariance matrix learning strategy to relieve the dependence of DE on the coordinate system, and improve the capability of DE to solve problems with high variable correlation. Another modification of crossover proposed by Islam et al. [50] was incorporating a greedy parent selection strategy with binomial crossover strategy, named p-Best crossover. The p-Best crossover is beneficial to improve the performance of DE by a large magnitude.
Concerning the selection operator, Hoang [57] introduced a new probabilistic similarity-based method to improve the DE's selection process, and preserve the population diversity. Wang and Gao [58] designed a local selection operator by decomposing the high dimensional problem into some subcomponents and assigning a local fitness function to evaluate each subcomponent. Recently, Addawe [59] presented a mathematical analysis of DE-based algorithm with SA-like selection operator. The SA-like selection operator is capable of enhancing the performance of conventional differential evolution algorithm by increasing the variance of the population to avoid obtaining a local optimum.

IV. ADAPTIVE DIFFERENTIAL EVOLUTION WITH SUCCESSFUL EXPERIENCE
In this section, the proposed ADEwSE algorithm, a JADE variant, which embedded successful experience vector into the ''current-to-pbest/1'' mutation rule. The main innovations of this work are presented as follows in detail.
1) In classic differential evolution mutation strategies, each difference vector for perturbation is generated by a few individuals in the population. However, this process does not take the history evolution direction information into consideration. In this section, a novel mutation scheme by embedding a successful experience vector with ''current-to-pbest/1'' mutation strategy is designed to improve the search ability of DE.
2) Sorting crossover matrix technique is added. The principle of sorting crossover matrix is similar to the mechanism of VOLUME 8, 2020 sorting crossover rate. Instead of assigning better individuals with smaller CR values, the crossover matrix is sorted according to real crossover rate, then the better individual will be assigned a binary string which has smaller real crossover rate value.
3) Opposition learning of crossover rate scheme is proposed to dynamically change the crossover rate, help switch the exploration and exploitation phase of DE adaptively, and maintain the diversity of population. 4) During the evolution search process, some target individuals in the DE population may stop generating better solutions. To solve this problem, a disturbance-based crossover is applied to the mutant vector v i,G and disturbance vector x d i ,G . 5) At each generation, each individual x i has an associated p i generated randomly from a normal distribution.
Since the DE/current-to-best/1 strategy Eq. 7 incorporates the best solution information in the search process, it can guide the mutant vectors towards the best member of the population. However, it has been shown that this strategy may lead to premature convergence and poor performance on multimodal problems. Thus, Jia et al. [24] proposed a ''current-to-pbest'' mutation strategy, which can be described as follows: where x G pbest is randomly selected among the top 100 × p% (p ∈ [0, 1]) individuals in the Gth generation. F i is a scale factor associated with individual x i . The greediness of ''current-to-pbest/1'' is controlled by parameter p which can balance exploitation and exploration abilities in the process of evolution.
In the original JADE, the parameter p, which is used to control the greediness of the ''current-to-pbest/1'' mutation strategy, is static during the whole evolution process and set manually. To improve the exploration ability of SHADE, each individual x i has an associated p i , which is selected randomly from a uniform distribution. In this article, the p i value is set adaptively for each individual x i based on a Gaussian perturbation scheme. The formula of generating of p i is: where mean value µ p is initialized to be 0.5. In case a value for p i outside of [2/NP, 0.5] is generated, it is truncated to the limit value. At the end of each generation, µ p is updated as follows: where S p is the set of successful p values in the current generation; c p is a learning rate in the range of (0, 1); mean A (x) means arithmetic mean. According to Eq. 13, the greediness is adjusted by the parameter µ p instead of p value. Thus, µ p is set to be large at the beginning of the evolutionary process to maintain the diversity of the population. It is noteworthy that the update formula for µ p in Eq. 14 uses an arithmetic mean, this biases µ p to converge to a small value. In other words, as the individuals converge, the parameter µ p will decrease to improve the local exploitation ability.

B. DISTURBANCE BASED CROSSOVER
To give differential evolution algorithm a chance of escaping from false optima and improve the efficiency of search, a disturbance-based crossover (DX) technique is incorporated into ADEwSE when a target vector is stagnant. ST denotes a stagnation memory vector which records the consecutive unsuccessful updates for each vector x G i . The contents of ST are updated as follows: When the ith target vector is stagnant, i.e., ST G i ≥ T , where T is a threshold, the DX is realized as follows: which is defined as: where r p = i is selected randomly from the better individuals than target vector i; δF generated by a uniform distribution in the range of (−0.1, 0.1) can control the disturbance direction and size. Note that, for the global best individual, the disturbance-based crossover would not be conducted. The binary form of DX can be written as: In Section V, numerical experiments will be presented to show a comprehensive study of the effectiveness of DX. According to Eq. 15 -18, it can be seen that the DX crossover operation introduces the information of better individuals, so the exploitation ability can be enhanced.

C. A NOVEL MUTATION STRATEGY BASED ON SUCCESSFUL EXPERIENCE INFORMATION
In the selection phase of differential evolution, the objective function values of all trial vectors are compared to corresponding target vectors in the current population. Therefore, we can give a formal definition of a successful trial vector at a generation.
Definition 1 (Successful Trial Vector): without loss of generality, in a minimization problem, if the trial vector u G i produced by its target vector x G i survives in the selection step according to Eq. 11, then u G i is a successful trial vector.
According to definition 2, the successful experience vector δs G i is a difference vector determined by trial vector u G i and its parent. It is noted that the experience direction vector is not updated when the target vector x G i fails to generate a better offspring. Besides, from the definitions above, the successful experience vector of each individual records the historical evolution direction information, thus it can improve the local exploitation of differential evolution. Now, a successful experience memory δs can be established for all individuals, and updated according to Eq.19. The manner of updating the successful experience memory is presented in Algorithm 2.

08: End For
Obviously, although searching solution along these experience directions may lead to fitness improvement with a higher probability compared with the difference vectors generated by randomly selected individuals, the risk of trapping into local best points is also increased. Hence, a novel mutation strategy is introduced denoted ''current -to-pbest with SE /2'': (20) where δs G rd is a successful experience vector selected randomly from experience memory; rd ∈ [1, NP] is the index of chosen successful experience vector;r1 r2 are selected randomly from the current population; K i ∈ (0, 1) is a inertial wight factor used to control the effect of successful experience vector; i ∈ [0, 1] is a new scale factor that adjusts the size and direction of successful experience vector. one can observe from Eq. 20 that the incorporation of the successful experience information in the mutation strategy has two benefits. Firstly, the successful experience vector is calculated according to the successful search direction of an individual, a better solution would be found with higher probability following the direction indicated by δs G rd . Secondly, the target vectors are not always attracted toward the pbest solutions found so far which helps avoid premature convergence at local optima.

D. PARAMETER ADAPTION SCHEMES
In JADE, CR i is generated at the beginning of each generation according to the following equation: where randn i (µ CR , 0.1) returns a random value with a normal distribution (µ CR , 0.1). As long as the values generated by Eq. 21 are out of the range [0, 1], a repaired scheme would be activated. if CR i > 1 or CR i < 0, it is truncated to [0, 1]. At the end of each generation, the mean value µ CR is updated as follows: where S CR is the set of successful CR values in the current generation; mean A (x) is the arithmetic mean; and the learning rate c is a positive constant in the range (0, 1). The generation of F i is similar to CR i with the following manner, where the mean L is Lehmer mean; S F is the set of any F i that helps the mutation strategy to generate better offspring; randc i (µ F , 0.1) means Cauchy distribution with location parameter µ F and scale parameter 0.1. After generation of F i , If F i ≤ 0, it is regenerated by the execution of Eq.23 until an effective value is obtained; if F i > 1, it is truncated to 1.0. In Eq. 20, an inertial weight parameter K is introduced to adaptively control the effect of successful experience vector, and defined according to following the equations: where randn i (µ B , 0.1) generates B i by applying a normal distribution with a mean value µ B and a standard deviation 0.1; S B is the set of successful B values, and mean Pow VOLUME 8, 2020 is the power mean with n = 1.5; |S B | denotes the cardinality of the set S B . After generation of B i , if B i < 0 or B i > 1, B i will be regenerated. To keep the exploration ability at the beginning of searching process, µ B is set to 0. From Eq. 24, the weight of successful experience vector K would be reduced if certain individuals stagnate in the population.
In addition, to ensure that better individuals have more chance to explore its neighborhood domain, the B values should be sorted, then smaller B values are assigned to those individuals with better fitness values. Apart from parameter K , a new scale factor is used to control the size and direction of successful experience vector. For each individual, i is generated by where i and A i are direction controller and scale factor, respectively; sgn (x) is a signum function which is often defined simply as 1 for x > 0 and −1 for x < 0. |S | is the cardinality of the set S . If i < 0 or i > 1, i is truncated into an interval [0, 1]. For parameter A i , If A i > 1, it is truncated to 1; if A i ≤ 0, it is regenerated by using Eq. 25. At the early stage of searching process, if µ is close to 1, then generated from Gaussian distribution is more than 0.5 in most cases. It means that sgn ( − rand (0, 1)) is equal to 1 in the majority of cases, then the mutant vector v G i tend to evolve along with the directions of successful experience which have generated better offspring in the last generation. In our view, a larger initial value of µ is not conducive to balance the exploration and exploitation abilities. In this work, µ is set to 0.5 at the initial stage of algorithm. With regard to parameter , it's controlled by parameter µ A . To weaken the risk of premature convergence, it's necessary to decrease the scale factor of successful experience vector, then improve the randomness of algorithm. Hence, similar to direction controller , it's proper to set the location parameter µ A = 0 at G = 1.

E. SORTING CROSSOVER MATRIX SCHEME
In differential evolution, the crossover rate controls how many components of the newly generated trial vector are from the mutant vector [21]. In the crossover operation, retaining the components of better individuals into its trial vectors will increase the probability of the generation of better solutions. Zhou et al. [28] introduced a modification to JADE by sorting the crossover rate. In their research work, CR set and population are sorted according to their values, a smaller CR value is then reassigned to an individual with smaller fitness value. Based on the experimental results conducted, sorting crossover rate can enhance the global search ability of JADE. In this section, we introduce another sorting strategy based on the real crossover rate.
At each generation, a crossover matrix b G is generated by using Eq. 10. So, the real crossover rate is calculated as: In the proposed sorting crossover matrix scheme, b, CR are sorted according to the CR values, then the population is sorted in view of fitness values. Finally, the b, CR with smaller CR are reassigned to an individual with better fitness value. The procedure of sorting crossover matrix technique is shown in Algorithm 4. The crossover rate practically plays a vital role for keeping population diversity [18]. In fact, if CR is small, the components of the trial individual inherit more information from the parent vector than its mutant vector. It implies that small CR values increase the possibility of stagnation that may weaken the exploration ability of the algorithm. On the other hand, larger CR values will increase the population diversity. Nevertheless, it will reduce the stability of search in the evolution process. According to above considerations, and inspired by opposition learning mechanism, the opposition learning of CR is proposed in this section, which can switch the larger CR and smaller ones dynamically and adaptively. During the iteration, if an offspring can replace its parent, it means that some better gene information has been obtained. Now, if the CR i is high, a small CR i at the next generation is helpful to maintain the better genes to survive and more likely to generate better offspring. On the contrary, if the CR i is small, a large CR i will be used, then, it may increase the population diversity. According to the discussion above, opposition learning of crossover rate (OLCR) technique is proposed. At the beginning of generation, we set OLCR value CR i = CR i , then CR i is updated according to the following equations:

Algorithm 4 Procedure of Crossover Matrix
According to the Eq. 27, in the case of that a target vector x G i is replaced by its trial vector u G i , the crossover process in next generation will be controlled by opposition crossover rate CR i . This new crossover operation is defined by the following equations: It should be noted that OLCR technique modifies the crossover matrix only if a better solution is obtained by the evolution search. In addition, the CR values are identical to CR values at the beginning of the searching, and then updated the contents of OLCR using Eq. 28. The detailed pseudo-code of OLCR is shown in Algorithm 5. By incorporating the successful experience vector into the ''current-to-pbest/1'' mutation strategy, and the self-adaptive control parameter setting method introduced above, we can now present the entire ADEwSE framework as follows.

1) INITIALIZATION
Set generation time G = 0, randomly generate an initial population in the feasible solution region, and initialize successful experience vector memory according to Algorithm 3. The other relevant parameters are given as follows: NP = 0.5, set µ CR = 0.5 (the initial mean value of normal distribution for crossover rate CR), µ F = 0.5 (the initial location parameter of Cauchy distribution for scale factor F), µ A = 0 (the initial location parameter of Cauchy distribution for scale factor α), µ B = 0 (the initial mean value of normal distribution for weight parameter β), µ p = 0.5 (the initial mean value of normal distribution for the proportion of the pbest individual).

2) MUTATION
For every individual x G i , a mutant vector is generated by Eq. 20. In the early stage of evolution process, the mean value µ B and location parameter µ A go close to zero, then a small parameter is generated. It means that a small step along the direction of successful experience is conducted. In this way the diversity of the population could be maintained. In addition, a larger p value is beneficial to enlarge the search region, and improve the exploration ability at the beginning.

3) CROSSOVER
For each target vector x G i and its mutant vector v G i , a trial vector u G i is generated by a binary crossover operation expressed as Eq. 10 in Section II. The values of CR and crossover matrix b are sorted according to real crossover rate CR . Each individual in the population is assigned a binary string in matrix b based on its fitness value. In addition, we repair the crossover matrix b with the opposition learning technique if a successful search is obtained in the last generation. Finally, for individual i, if ST G i > T , DX will be performed.

4) SELECTION
As classic DE, the fitness value of the trial vector u G i is compared with the fitness of its target vector x G i , then the superior one will be chosen as x G+1 i to enter the next generation. The pseudo-code of ADEwSE is presented in Algorithm 6. In addition, an improved ADEwSE by introducing population size reduction scheme is presented in section V. For i = 1 to NP do 06:

V. EXPERIMENTAL RESULTS AND COMPARISONS
In this section, the computational results of the proposed algorithm are presented and discussed along with the comparison with other advanced DE and other ''current-to-pbest/1'' based adaptive differential evolution algorithms.

A. ALGORITHM COMPLEXITY ANALYSIS
In general differential evolution algorithms including ADEwSE, the computation complexity in mutation and crossover is O (NP · D). Like JADE, the complexity of finding all pbest solutions in ADEwSE is O(NP · log (NP)). Besides, there are O (NP) operations in both selection and parameter adaptation. Furthermore, supposing that the fitness values of all trial vectors are better than or equal to the fitness values of parents in each generation, the update of successful experience vector memory costs O (NP · D) floating-point arithmetic operations. Additionally, opposition learning of crossover rate and sorting crossover matrix cost O(NP · D + NP·log (NP)+NP) logical operations. The total complexity of ADEwSE is therefore O(NG · NP · (3D + 2 · log (NP) + 2)), where NG is a fixed number of generations. After omitting the lower order terms, the magnitude of com-putation complexity of ADEwSE is O (NG · NP · D), which is the same as classic differential evolution, and does not require excessive computational cost.
In addition, compared with classic DE algorithms, ADEwSE takes more RAM for storing the successful experience vectors. If NP and D are determined, the ADEwSE will consume extra NP · D double-precision to store successful experience vectors.

B. EXPERIMENTS SETUP, PARAMETER SETTINGS AND INVOLVED ALGORITHMS
To test the performance and the viability of the proposed ADEwSE algorithm, the ADEwSE is applied to solve the optimization benchmark functions. These functions are presented for CEC2017 competition, and a detailed description of tested functions can be found in [60]. According to their character, these functions can be classified into four classes: (1) unimodal functions: f 01 , f 03 (2) basic simple multimodal functions: f 04f 10 (3) hybrid functions: f 11 − f 20 (4) composition functions: f 21f 30 For persuasive comparisons, experiments of the compared algorithms were conducted on the test suit. In this study, the solution error measure f (x) −f (x * ), in which x is the best result obtained by algorithms in one run and x * is the global optimum of each benchmark function, is adopted. As suggested in [61], error values and standard deviations smaller than 1E −8 are considered as zero. The dimensions (D) of function are 30, 50 and 100, respectively. The maximal number of function evaluations (FE) is set to 10,000D in all compared algorithms and each algorithm performs 51 independent runs on the same computer.
To perform a comprehensive evaluation, ADEwSE is compared to fourteen DE-based algorithms, namely: the differential evolution with self-adapting control parameters (jDE) [19], adaptive differential evolution with the optional external archive (JADE) [20], success-history based parameter adaptation for differential evolution (SHADE) [21], differential evolution based on covariance matrix learning and bimodal distribution parameter setting (CoBiDE) [22], differential evolution with multi-population based ensemble of mutation strategies (MPEDE) [18], adaptive guided differential evolution algorithm with novel mutation for numerical optimization (AGDE) [33]. real-parameter uncons-trained optimization based on enhanced fitness-adaptive differential evolution algorithm with novel mutation (EFADE) [62], novel mutation strategy for enhancing SHADE and LSHADE algorithms for global numerical optimization (EDE) [29], differential evolution with ranking based mutation operators (JADErank) [63], Repairing the crossover rate in adaptive differential evolution (JADErcr) [21], adaptive differential evolution with sorting crossover rate for continuous optimization problems (JADEsort) [28], an improved adaptive differential evolution algorithm for continuous optimization (pbestrr-JADE) [64], distance-based parameter adaptation for Success-History based differential evolution (DbSHADE) [65]. To compare and analyze the solution quality effectively from a statistical angle of various algorithms, the results are compared by adopting two nonparametric statistical hypothesis tests, namely, multi-problem Wilcoxon signed-rank test and Friedman test. The significance level is set to 0.05. For multi-problem Wilcoxon signed rank test, R + and R − denote the sum of ranks for test problems in which the first algorithm performs better than or worse than its competitor, respectively. A larger rank indicates a larger performance discrepancy. Regarding Friedman test, the algorithms with small rankings have better performance according to the final rankings of different algorithms for all functions.

C. COMPARISON AGAINST ADVANCED DIFFERENTIAL EVOLUTION VARIANTS
In this part, the comparison is performed between ADEwSE and nine advanced DE variants (i.e., jDE, JADE, SHADE,  Tables 2, 3, and 4, respectively. It includes the obtained mean (denoted by Mean) and the standard deviations (denoted by SD) of error from optimum solution of ADEwSE and other nine advanced DE variant algorithms over 51 runs for all 29 benchmark functions. The best results are marked in bold for all problems. In addition, the number of the best results (Nob) are summarized at the bottom of the tables. The multi-problem Wilcoxon signed-rank and Friedman tests between ADEwSE and other competitors are shown in Tables 5, 6 and 7, respectively.
Furthermore, the results of multi-problem Wilcoxon's test presented in Table 5 clearly show that ADEwSE obtains higher R + values than R − in all cases, and all the p-values are less than 0.05. It demonstrates that ADEwSE outperforms other compared advanced DE variant algorithms significantly. Besides, Table 6 shows that ADEwSE obtains the first ranking among all compared algorithms for all 29 problems with 30D, 50D and 100D, based on the average ranking calculated by Friedman test. To analyze the performance of all algorithms on all functions at different dimensions, the mean aggregated rank of all the ten algorithms across all 29 problems and all dimensions included 30D, 50D and 100D is presented in Table 7. The statistical results show that ADEwSE is the best followed by SHADE as second best among all compared algorithms.
Moreover, to demonstrate the superiority of proposed algorithm intuitively, six representative test functions of CEC with 100D are chosen to plot the convergence graph of ten algorithms with respect to the mean fitness values. The convergence graphs plotted in Figure 1 show that ADEwSE gets the best results in most cases. However, the convergence graphs imply that the convergence speed of ADEwSE is slower than JADE and SHADE at the early stage of the searching process.
The reason is that the modified parameter settings like opposition learning of crossover rate can strengthen the exploration ability, and avoid trapping into a local optimum.
Overall, from the above experimental results, comparison and analysis, the proposed ADEwSE is a highly competitive optimization algorithm with better searching quality, efficiency, and robustness for solving unconstrained global optimization problems.

D. COMPARISON WITH OTHER ''CURRENT-TO-PBEST/1'' BASED ADAPTIVE DIFFERENTIAL EVOLUTION ALGORI-THMS
In this subsection, considering that the mutation strategy in ADEwSE is based on the classical ''current-to-pbest/1'' VOLUME 8, 2020    Tables 8, 9, and 10, respectively, in which the minimum results are marked in bold. Besides, the number of cases on which each algorithm performs the best is listed at the bottom of the Tables. Table 11 and  Tables 12, 13 respectively represent the statistical results obtained by multi-problem Wilcoxon signed-rank and Friedman tests. The histogram of the number of the best results of each algorithm is shown in Figure 3.
From Tables 8, 9 and 10, it can be seen that at D = 30, ADEwSE obtains the best results in 25 cases, while JADE, SHADE, JADErank, JADEcr, JADEEP, JADEpbestrr, JADEsort, and DbSHADE obtain the best results on 4, 5, 4, 4, 2, 4, 6, and 5 functions, respectively. Concerning 50D functions, JADE, SHADE, JADErank, JADEcr, JADEEP, JADEpbestrr, JADEsort, and DbSHADE achieve the best results in 3, 2, 3, 2, 0, 3, 3, 2 cases, respectively, while ADEwSE achieves that in 24 cases. About 100D functions, ADEwSE gets the best results in 19 cases, while JADE, SHADE, JADErank, JADEcr, JADEEP, JADEpbestrr, JADEsort, and DbSHADE achieve 2, 3, 2, 1, 0, 1, 2, 6 cases, respectively. In addition, the number of the best results achieved by each algorithm with all dimensions as depicted in Figure 3 further demonstrates the superior performance of the proposed algorithm. Furthermore, in Table 11, the statistical results of applying multi-problem Wilcoxon's test between ADEwSE and other ''current-to-pbest/1'' based algorithms are summarized. It clearly shows that ADEwSE gets higher R + values than R − values in all cases. According to the Wilcoxon's test at α = 0.05 and α = 0.1, the significance difference can be found in all cases, which means that ADEwSE performs significantly better than other compared algorithms. Additionally, Table 12 lists the average rankings of ADEwSE and other ''current-to-pbest'' based adaptative differential evolution algorithms. ADEwSE is ranked the first among the nine methods for 29 test problems with 30D, 50D and 100D, based on the average ranking achieved by the Friedman test. Furthermore, the performance of different algorithms is evaluated by using all dimensions functions. Therefore, the mean aggregated rank of all nine algorithms for all dimensions and problems is shown in Table 13. According to the statistical results in Table 13, one can conclude that ADEwSE is the best followed by DbSHADE as second best among all algorithms.
In a word, it is clear that the performance of proposed ADEwSE is significantly better or at least very competitive to those eight famous ''current-to-pbest/1'' based adaptive algorithms on CEC2017 with 30D, 50D, and 100D.
The statistical results of ADEwSE, JADE, and JADEsort with four mutation strategies are listed in Table 14. Additionally, the compared results obtained by multi-problem It is shown that ADEwSE-S3 (standard ADEwSE) obtains the best results in 13 cases in Table 14. Besides, based on Table 14, ADEwSE-S1, ADEwSE-S2, and ADEwSE-S4 achieve best results in 4, 6 and 9 cases, respectively, while JADE-S1, JADE-S2, JADE-S3, JADE-S4, JADEsort-S1, JADEsort-S2, JADEsort-S3, JADEsort-S4 achieve the best results 2, 1, 3, 2, 0, 0, 3, 2 cases, respectively. Furthermore, as shown in Table 15, ADEwSE-S3 performs better, and gets higher R+ values than R− values in all dimensions. According to the statistics of p = 0.05 and p = 0.1, significant difference can be observed in 12 and 12 cases, respectively, for the functions at 50D. Note that no significant difference can be found between ADEwSE-S3 and ADEwSE-S2, ADEwSE-S4, it means that the three variants have similar performance in optimization problems. The ranking values given by Friedman test in Table 16 clearly show that ADEwSE variants obtain higher ranks than JADE and JADEsort variants. In addition, Table 16 also shows that ADEwSE-S3 gets the first ranking among all variants in 50-dimensional problems followed by ADEwSE-S1, ADEwSE-S2.
All in all, comparing all 12 algorithms, the best algorithm is ADEwSE with third mutation strategy, which can improve the convergence precision in most tested cases. Finally, it is obviously concluded that the superiority of ADEwSE variants is more evident than JADE and JADEsort variants.

F. BENEFIT OF THE PROPOSED MODIFICATIONS TO THE PERFORMANCE OF ADEWSE
In this subsection, to comprehensively investigate the contributions of the proposed modifications, five variants of ADEwSE are constructed. The compared algorithms are list as follows.
(1) Version 1 to study the effectiveness of the crossover matrix sorting scheme, the crossover matrix sorting strategy is not adopted in ADEwSE. This version denotes the ADEwSE-1.
(2) Version 2 to study the effectiveness of the opposition learning of crossover rate (OLCR), all modifications are kept unchanged except the OLCR scheme. This version denotes the ADEwSE-2.
(3) Version 3 to study the effectiveness of successful experience, the ''current-to-pbest'' mutation strategy is adopted and combined with other modifications. This version denotes the ADEwSE-3.
(4) Version 4 to study the effectiveness of disturbancebased crossover (DX), the DX scheme is not used to enhance the exploitation ability. This version denotes the ADEwSE-4.
(5) Version 5 to study the effectiveness of the adaptive p setting, the constant p value p = 0.05 is used. This version denotes the ADEwSE-5.
The overall comparison results of the ADEwSE against its five versions (ADEwSE-1, ADEwSE-2, ADEwSE-3, ADEwSE-4, and ADEwSE-5) are summarized in Table 18. From Table 18, it is clear that the ADEwSE achieves the best results in 18 functions while ADEwSE-1, ADEwSE-2, ADEwSE-3, ADEwSE-4, and ADEwSE-5 achieve that in 3, 7, 4, 4 and 4, respectively. Furthermore, according to the best numbers achieved by each algorithm, the ADEwSE is helpful to improve the precision of the solution. Besides, multi-problem Wilcoxon's test results shown in Table 17 demonstrates that ADEwSE obtains higher R + values than R − values in comparison with its five versions.
In terms of these experimental results and discussion, it can be concluded that the hybrid of proposed modifications is more efficient to improve the performance of the algorithm than only using single one.

G. VALIDITY ANALYSIS OF THE STAGNATION THRESH-OLD T
In ADEwSE, the execution of DX is controlled by the stagnation threshold value T. To study the efficiency of this parameter, four variants with different T values (50, 100, 170, 300, and 500) are compared with the standard ADEwSE (T = 200). All other control parameter settings remain unchanged to allow a direct comparison. The experimental results are presented in Table 19. The average ranking values obtained by each T are displayed in Table 20.
Firstly, the number of the minimum results of each T listed at the bottom of Table 19 indicates that the standard ADEwSE (T = 200) achieves the best results in 13 test functions, while T = 100, T = 170, T = 300, and T = 500 obtain 3, 4, 13, 6, and 6 best results, respectively. Furthermore, it should be noted that T = 170 has similar performance to the standard ADEwSE in terms of the best results number obtained by each T value. Besides, Table 19 shows that T with too small or too large values are unsuitable for ADEwSE.
The reason is that a small T value forces DX to perform frequently, which in turn significantly deteriorates the exploitation ability and makes the algorithm stagnate further. On the contrary, a large T value will weaken the benefits of DX. In addition, the Friedman test results listed in Table 20 show that standard ADEwSE with T = 200 achieves the first  ranking. Finally, according to these experimental results and analysis, we can conclude that T = 200 may be the most appropriate choice for the most of problems.

H. ENHANCING ADEWSE ALGORITHM BY USING POPU-LATION SIZE REDUCTION STRATEGY
To improve the performance of ADEwSE further, a linear population size reduction scheme is introduced. As mentioned before, the population size significantly impacts the  convergence rate of the evolution algorithm. Large population size during the early stage of the optimization process is   beneficial to strengthen the exploration ability, then reducing the population size is needed for refining the solution quality [66]. The framework to decrease the population size is presented as follows.

NP G+1
= round [NP min + (NP min − NP max ) · (currFE/maxFE)] (33) where NP G+1 denotes the new population size in the next generation; currFE is the current function evaluations (FE); maxFE is the maximum number of function evaluations. The improved ADEwSE is named as LADEwSE. From Eq. 33, it's obvious that the population reduction strategy is identical VOLUME 8, 2020  with LSHADE [46]. In LSHADE, the NP max is set to 18D, then the population size decreased from NP max = 18D to NP min = 4. However, the adaptive adjustment of top p% value, crossover matrix sorting, and opposition learning of CR are used to improve the exploration ability, thus, LADEwSE is assigned a smaller NP max than LSHADE, namely, NP max = 10D. Note that the parameter adjustments of LADEwSE are the same as ADEwSE.
The results of 6 state-of-art algorithms are adopted from their original papers.
Firstly, the overall comparison results of the LADEwSE against ADEwSE and other six powerful algorithms on the benchmarks with 30D, 50D and 100D are provided in Tables 21, 22 Table 21, LSHADE-EpSin, ELSHADESPACMA, LSHADESPACMA, EBWO, EBL-SHADE, JSO, ADEwSE, LADEwSE can obtain the best results on 9, 7, 9, 11, 8, 10, 8 and 9 functions, respectively. Concerning the 50D problems, LADEwSE obtains 10 best results, and EBWO, EBLSHADE, JSO, ADEwSE obtain the best results on 4, 4, 13, 8, 4, 3, 3 functions, respectively. With regard to 100D problems, LSHADE-EpSin, ELSHADE-SPACMA, LSHADESPACMA, EBWO, EBLSHADE, JSO, ADEwSE, LADEwSE achieve the best results in 4, 4, 13, 8, 4, 3, 3, 10 cases, respectively. Based on the number of best results, one can conclude that LADEwSE and LSHADESP-ACMA perform well on all dimensions, especially for 50D and 100D. At D = 30, EBWO gets the first rank according to the number of best results. Besides, Table 24 reports the results of multi-problem Wilcoxon's test between LADEwSE and other competitive algorithms for 30D, 50D and 100D problems. From Table 24, LADEwSE obtains higher R + values than R − values in most cases while slightly lower R + values than R − values in comparison with EBWO at 30D and LSHADESPACMA at 100D. However, the LADEwSE is competitive with EBWO and LSHADESPACMA. According to the Wilcoxon's test at α = 0.05 and 0.1, the significance difference can be found in 13 and 13 cases for all dimension problems, respectively. Furthermore, the results of Friedman test summarized in Table 25 clearly show that among all algorithms, LADEwSE is ranked the first for test problems with 30D and 50D, and obtains the second ranking in 100D problems.
In addition, the mean aggregated rank of all the eight algorithms is presented in Table 26 to give a comprehensive comparison for all dimensions. From Table 26, it can be clearly concluded that LADEwSE is the best followed by LSHADESPACMA as second best among all algorithms. Overall, all these results and discussion shows that LADEwSE is of better searching quality and robustness for solving unconstrained problems. In addition, in comparison with ADEwSE, the LADEwSE is significantly better than ADEwSE in 30D, 50D and 100D benchmark problems according to experimental and statistical results shown in Tables 21-26. It means that the linear reduction of population size is an efficient strategy to improve the searching ability.
Although LADEwSE has exhibited robust performance in most of the test problems, from the precision displayed  in Tables 21-23, it fails to obtain higher solution precision than LSHADESPACMA in many cases. Besides, the initial popu-lation size is set based on the trial and error method, a comprehensive investigation needs to be considered in the future.

VI. CONCLUSION
To strength the search ability of adaptive differential evolution, a modified mutation strategy by adding a successful experi-ence vector with the ''current-to-pbest/1'' mutation strategy is proposed. The successful experience vector is calculated when a trial vector has better fitness than its associated parent. Furthermore, opposition learning of crossover rate, crossover matrix sorting scheme, and adaptive adjustment of top p% values are added into the proposed algorithm to balance the exploration and exploitation ability. In addition, a disturbance-based crossover, which is helpful for improving the exploitation ability of the algorithm as well as avoiding the stagnation, is presented. Extensive numerical experiments have been conducted. The results demonstrate that the proposed algorithm can improve the global search ability and the solution precision in comparison with advanced DE variants and ''current-to-best/1'' based adaptive DE algorithms. Besides, the validity of modifications, such as crossover matrix sorting, opposition learning of crossover rate, and adaptive top p% value, are experimentally studied. In addition, the effectiveness of DX in handling stagnation is confirmed by using different threshold T values.
Considering that the strategies in ADEwSE are beneficial to improve the performance of DE in most cases, an improved ADEwSE with a linear reduction of population size is investigated. The LADEwSE is more powerful, efficient, and robust than ADEwSE. The performance of LADEwSE is confirmed by comparing it with 6 winner algorithms in CEC2017 competition that LADEwSE is statistically superior to and competitive with its competitor algorithms.
As a continuation of this research, we will focus on how to modify the ADEwSE algorithm for multi-objective and constrained optimization problems and practical engineering applications. Another possible direction is integrating the successful experience vector with other self-adaptative DE variants like jDE, SHADE, and many others.