Adaptive Differential Evolution with Information Entropy-based Mutation Strategy

In order to balance the exploration and exploitation ability of differential evolution (DE), different mutation strategy for different evolutionary stages may be effective. An adaptive differential evolution with information entropy-based mutation strategy (DEIE) is proposed to divide the evolutionary process reasonably. In DEIE, the number of Markov states deduced from the crowding strategy is determined first and then the transition matrix between states is inferred from the historical evolutionary information. Based on the above-mentioned knowledge, the Markov state model is constructed. The evolutionary process is divided into exploration and exploitation stages dynamically using the information entropy derived from the Markov state model. Consequently, stage-specific mutation operation is employed adaptively. Experiments are conducted on CEC 2013, 2014, and 2017 benchmark sets and classical benchmark functions to assess the performance of DEIE. Moreover, the proposed approach is also used to solve the protein structure prediction problem efficiently.


I. INTRODUCTION
D IFFERENTIAL evolution (DE), proposed by Storn and Price [1], is a competitive and popular population-based stochastic search algorithm. DE and its variants have made remarkable contribution to solving complex optimization problems [2]. Similar to other evolutionary algorithms, DE consists of three operations, i.e., mutation, crossover, and selection. The difference vectors of DE have adaptability for perturbation to the natural scales of the objective landscape in a random process [3]. This self-referential mutation provides DE with a tremendous speed advantage at the early stage. However, this property makes DE sensitive to the loss of diversity, then resulting in poor exploitation at the later evolutionary stage [4]. In terms of the mutation operator, various mutation strategies show distinct advantages in DE. Inappropriate mutation strategies may cause stagnation due to overexploration or premature convergence because of over exploitation [5].
Hence, how to balance exploration and exploitation is still an open issue in the evolutionary computation community.
For exploration and exploitation conundrum in DE,it may be feasible to divide the evolutionary process into different stages for the balance between exploration and exploitation. Many approaches have been developed to improve the performance of DE by the division of evolutionary stages. For example, a fixed number of iterations are used as a division criterion, such as two stages proposed by Liu et al. [6] and three stages introduced by Cheng et al. [7]. Subsequently, suitable mutation strategies are used in each stage. Although the performance of DE may be improved, empirical guidelines are sometimes unreliable and lack universality. Yu et al. [8] focused on new metrics that represent the relationship between the order of fitness value and distance to divide evolutionary process into two stages with corresponding parameter adjustment mechanisms and strategies. Tang et al. [5] presented a variant with individual-dependent mechanism, in which the search process is separated into two stages to design mutation strategy for specific stages. The algorithm enters the later stage according to the defined success rate. Fan and Yan [9] introduced a self-adaptive DE called ZEPDE. After DE/rand/1 is employed at the first stage, the mutation strategy with zoning evolution is assigned to each individual in accordance with the selective probability at the second stage. Zhan et al. [10] proposed master-slave multipopulation distributed framework, three populations are co-evolved which different populations adaptively choose their suitable mutation strategies based on the evolutionary state estimation. The evolutionary state is estimated into two states by distance computations between two individuals with the best fitness value and the median fitness value. Li et al. [11] designed an evolutionary state estimation method based on the correlation coefficient between the population distribution in objective space and solution space. Then, the evolutionary process is divided into three kinds of state. Zhou and Zhang [12] proposed the underestimate model based on abstract convex theory, in which the variation in the average estimation error is used to divide the evolutionary process into three stages with corresponding strategy candidate pool.
Instead of being divided into irreversible multistage, it is better to dynamically divide the evolutionary stage based on the search behaviour of population. Therefore, better understanding is needed of the population dynamics. In the past few years, entropy was utilized as an evaluation criterion to measure certain properties of population or evolutionary process as follows. Based on the diversity defined by the genotypic and phenotypic entropies, Naghib and Nobakhti [13] designed a fully adaptive DE with the adaptive rule of the parameters. Wu et al. [14] proposed a diversity metric based on the crowding entropy to sustain the diversity of Pareto optimality. Similarly, Zhang et al. [15] utilized entropy diversity method to adaptively monitor population diversity. Chen et al. [16] coupled DE algorithm with entropy to solve multi-mode resource constrained project scheduling. Entropy based on activity durations is used as a measure of uncertainty to ensure the feasibility of the project despite the existence of unexpected events. Ali et al. [17] proposed a multi-level thresholding achieved by integrating the DE algorithm and Kapur entropy into image segmentation.
The motivation behind this research is to proposes an adaptive differential evolution with information entropy-based mutation strategy (DEIE), which realize a dynamic division of the evolutionary stages based on entropy and stage-specific mutation strategies adaption to obtain the trade-off between exploration and exploitation. To be specific, the Markov states are obtained in our method, and Markov state model is constructed by using the historical evolutionary information across generations to describe the frequency of state transition. Subsequently, the information entropy metric is proposed to estimate the extent that the population explores the solution space, which is mainly used for the dynamic division of the evolutionary stages. Then, the stage-specific mutation strategies are adopted to take their advantage based on exploration or exploitation stage. Compared to other DE variants, the contributions of this paper are : (1) Dynamic division of evolutionary stages of DE based on information entropy metric is realized in the hope of getting a trade-off between exploration and exploitation. (2) The information entropy metric is designed by using the historical evolutionary information across generations. The switching of evolutionary stage is realized in a statistical sense, while allowing the population to choose strategies adaptively in individual level. (3) DEIE can be extended to other real-life application. On the basis of stage division, the corresponding mutation strategies can be adjusted or replaced flexibly according to different application scenarios. Moreover, the proposed DEIE is tested on CEC 2013, 2014, and 2017 test sets, classical benchmark functions and a real-world case.

A. DIFFERENTIAL EVOLUTION
DE consist of mutation, crossover, and selection operations [1]. Starting from a random initial population including N P individuals, the better individual is retained whereas the inferior individual is eliminated.
The main operations of one DE variant, namely DE/rand/ 1/bin, are shown below. 1) Initialization: P g = {x g 1 , · · · , x g i , · · · , x g N P } called population is randomly produced from the solution domain, , and g is expressed as the gth generation.
2) Mutation: The two individuals randomly selected from the population is used as the perturbation of the base vector, and the perturbation is weighted to produce the mutant indi- where F > 0, rand 1 , rand 2 , and rand 3 are chosen from [1, N P ], and they differ from i but also to each other.
3) Crossover: The binomial crossover operator copies the jth parameter of the mutant individual v g i to the corresponding element in the trial individual u g i according to the crossover rate. Otherwise, it is copied from the corresponding target individual x g i .
where CR as the crossover rate is chosen from (0, 1); rand(0, 1) is randomly generated from [0, 1]; j ∈ [1, D]; j rand is a integer randomly yielded from [1, D]. 4) Selection: If the trial individual u g i achieves the better function value than that of the target individual x g i , u g i will replace x g i in the next generation, otherwise the x g i is still preserved.
where f (u g i ) and f (x g i ) are the function value of u g i and x g i , respectively.
The concept of entropy derives from thermodynamics and is successfully applied to the different fields of science and engineering. Entropy introduced by Shannon [18] characterizes the uncertainty related to the occurrence of a random event, which is equal to its information content. In mathematics, we let X = x i , i ∈ n be a discrete random variable that represents the event of its occurrence, and the probability is denoted by p i . Then, entropy function E can be defined as Information entropy increases with the increase in uncertainty. As a result, the measure reaches a peak value when all the outcomes are equiprobable. This implies that

III. LITERATURE REVIEW
Although DE performs well on a wide variety of problems, it has a series of problems related to stagnation, premature convergence, and so on [3], [19]. One direction of improvement is on the mutation scheme modification. Mutation is the most important step of DE as it produces a new individual in the population. Over the last few years, a lot of modifications in mutation scheme have been proposed. Many researchers have worked towards the new mutation strategies, which helps to explore the search space by perturbing individuals, substantially to influence the performance of DE. Many different mutation strategies, such as rankingbased [20], archive-based [21], niche-based [22], centroidbased [12], and neighborhood mutations [23], have been proposed to enhance the search capability of DE. However, each of these mutation strategies, being more explorative or exploitative, seems to work for different tasks. Therefore, more attention has been paid to the multiple mutation operators which have strategies with both exploration ability and development ability.
Many approaches have been developed to improve the performance of DE by the cooperation of different mutation strategies. These algorithms can be roughly classified into three promising directions: 1) individual-specific strategy techniques; 2) subpopulation-specific strategy techniques; and 3) evolutionary stage-specific strategy techniques.
Methods in the first category aim to adaptively select mutation strategies for each individual from the strategy pool. These individual-strategy matching methods mainly include probability model-based, surrogate-assisted, and so on. Probability model-based DE updates selection probabilities based on successful historical experience [24]. The self-adaptive DE (SaDE) [25], DE with ensemble of mutation strategies and parameters (EPSDE) [26], DE with strategy adaptation mechanism (SaM) [27], and DE with adaptive strategy selection (CACDE) [28] can be considered to belong to the this category. Surrogate-assisted DE utilizes valid simplified models to approximate the fitness function and is thus computationally inexpensive [29], [30]. These techniques of constructing surrogate model include kernel density estimation [31], Kriging model [32], abstract convex underestimation [33], and so on.
Methods in the second category realize multiple operators of DE by utilizing various mutation strategies in different subpopulations. The DE with self-adaptive multisubpopulation [34], [35], DE with three small indicator subpopulations and one large reward subpopulation [36], DE with role assignment [37], and SHADE (success-history based adaptive DE) with subpopulation-based ensemble of mutation strategies are belong to this category.
For methods in the last category, the main idea is to divide entire searching process into multiple stages and select suitable mutation strategies for each stage. Some early works divided the whole process by setting a fixed number of iterations, such as two stages [6], three stages [7], and so on. Although the performance of DE may be improved, empirical guidelines are sometimes unreliable and lack universality. In order to accommodate the search characteristics in the evolutionary process of DE, researchers prefer to estimate the evolutionary states to distinguish the different stages. Yu et al. [8] discussed the relationship between the order of fitness value and distance to divide evolutionary process into two stages. Zhan et al. [10] proposed master-slave distributed framework based on the evolutionary state estimation. The evolutionary state is estimated to two states by distance computations between two individuals with the best fitness value and the median fitness value. Li et al. [11] designed an evolutionary state estimation method based on the correlation coefficient between the population distributions in objective space and solution space. Then, the evolutionary process is divided into three kinds of state.
It is distinct from the above evolutionary state-based adaptive operator selection methods realized by current population distribution estimation. In the proposed DEIE, an information entropy metric is designed by using the historical evolutionary information across generations, which reveal the trend of the movement of individuals in the search space. It is reasonable to estimate the extent that the population explores the solution space and then divided evolutionary process into two stages.

IV. DEIE ALGORITHM
This section introduces an adaptive differential evolution with information entropy-based mutation strategy, named DEIE, mainly including a dynamic stage division and a stagespecific mutation strategy adaptation technique.
The emergence of local fitness landscape on the multimodality is due to the individuals in the population are scattered at the exploration stage of evolution. The differences between individuals to each other are gradually reduced and the distribution is concentrated. Thus, exploitation stage can be determined by the property of the unimodal basins of VOLUME 4, 2016 local fitness landscape [38]. Based on the above property, it can be seen that search dynamics in DE induces basin-tobasin transfer, where trial solutions may traverse from one attraction basin to another one [39]. In consideration of the search behaviour of DE, these basins are defined as Markov states with respect to the partition of the solution space. In this way, the Markov state model is constructed using the historical evolutionary information across generations to describe the frequency of state transition. Subsequently, the information entropy metric is proposed to estimate the extent that the population explores the solution space, which is mainly used for the dynamic division of the evolutionary stages. Moreover, the suitable mutation strategies are utilized to update offspring individuals for the different stages.

A. THE DETERMINATION OF MARKOV STATES
Due to the search behaviour of basin-to-basin transfer, several subdomains decomposed from the entire solution space are defined as Markov states in this paper. Inspired by the automatic clustering of the crowding strategy in the multimodal method [40], a learning process is designed to guide the entire population split into several subpopulations located different optima. In this way, the multiple solution subspaces based on final spatial positions of the individuals are generated, namely Markov states.
The learning process consists of archiving, crowding and clustering operations. After Gen iterations, K stable Markov states can be obtained.
A population P g = {x g i }, i = 1, 2, · · · , N P is generated after initial operation, where N P is population size and g is generation count. The detailed procedure at one iteration is performed as follows.
1. Archiving operation Mutation and crossover operations act on the target individual x g i and generate the trial individual v g i . If v g i has the minimum Euclidean distance from x g j compared with the other individuals in P g , then v g i is added to the archive A g j of x g j . Repeating above steps for i from 1 to N P , all trial individuals are fell into the corresponding archive.
2. Crowding operation The purpose of crowding operation is to generate new population. For each archive A g j , the final optimal individual o g j and the corresponding radius r g j are calculated.
where A g j,t is t th trial individual of archive A g j , and f (A g j,t ) is the function value for A g j,t , d(A g j,t , o g j ) is the Euclidean distance between A g j,t and o g j , the size of A g j is l. Then, x g j is replaced by o g j in order to update population. 3. Clustering operation Objects to be clustered are individuals of new population and the clustering criterion is based on their location. Starting from the individual with minimum function value, the clustering step is performed with x g+1 s as the center and r g+1 s as the radius in turn according to the ascending order, where s = 1, 2, · · · , N P . Each individual x g+1 i of the new population is assigned to the corresponding cluster on the basis of a distance criterion.
The clustering process is completed when all the individuals are assign to a corresponding cluster. Denote these clusters as the Markov states S represented by center and K as the number of Markov states. After Gen iterations, the number K tends to be stable. Fig.1 shows the learning of Markov states of the onedimensional Rastrigin function at one iteration. The population is composed of six individuals. the parent individuals are represented by black circle P g = {1, 2, 3, 4, 5, 6}, whereas the trial individuals are marked in red circle {1 , 2 , 3 , 4 , 5 , 6 }. The indexes are used to mark their respective order. Six archives, namely, , with a parent individual and the corresponding trial individuals are shown in Fig.1b. The population updated using the optimal individuals, namely, , is the new population as shown in Fig.1c. The new population is finally divided into three subpopulations using clustering operation in Fig.1d. Notably, the above-mentioned learning process is not strictly the classical clustering method. In this part, we only focus on dividing the entire population into several subpopulations and define the final clustering partition as the Markov state. For a Markov state model with K states, the count of states transition can be obtained using historical evolutionary information across generations to build the transition matrix T g . The element t g ij in row i and column j in this matrix corresponds to the observed frequency of transitions from state i at the (g − 1)th generation to state j at the gth generation. Therefore, each element t g ij in the transition matrix T g can be described as where t g ij is an element in transition matrix T g , N (i g−1 → j g ) is the number of state transition from state i at the (g − 1)th generation to state j at the gth generation, and N (i g−1 ) is the number of individuals located in state i at the (g − 1)th generation. For example, there are 3 red individuals in state 2 at generation g − 1 (N (2 g−1 ) = 3), two of which move to state 1 at generation g (N (2 g−1 → 1 g ) = 2). So t g 21 = 2/3.

C. INFORMATION ENTROPY-BASED MUTATION STRATEGY ADAPTION
It is well known that the important operation of DE is its mutation operation [41]. The mutation strategy utilized by DE largely governs its tendency to discover promising regions or detect the optima. The population behaviour in different evolutionary stages influences the selection of mutation strategies to a certain extent. However, there is a problem that how to estimate the evolutionary stages and employ the stage-specific mutation strategy appropriately.
The mutation strategy adaption is performed as follows.
1. Information entropy metric The information entropy based on the Markov state model describes the extent that the population explores the solution space. Given the above-mentioned analysis, the information entropy can be used to estimate the evolutionary stages.
On the basis of the transition matrix, the probability that the evolutionary process is undergoing a transition between any given pair of states can be estimated by where K i=1 K j=1 p g ij = 1. Then, the information entropy E g across all possible transitions can be calculated by summing the Shannon entropy for each individual transition at each iteration where E g is denoted as the information entropy of the gth generation.
The value of E g is normalized using E max and E min as Shannon showed that this quantity achieves its maximum value when p g ij are equal to each other, and the value of E max can be calculated using and E min = 0 because individuals no longer have state transition when the evolution process stabilizes or terminates.

Stage estimation
In accordance with the aforementioned property of information entropy, the evolutionary stages can be estimated as follows: where Ψ represents the estimated evolutionary stage. There are several reasons for the above stage division.
(1) From the perspective of population, the different stages may coexist in the same generation, and each individual stage should be estimated separately.
(2) The largeĒ g caused by a case that the population is frequently transferred between several states, which indicates that the scope of solution space is explored extensively. VOLUME 4, 2016 For this case, the individual stage is likely estimated to be the exploration stage because the population explores different regions. The mutation strategy DE/rand/1 with good exploration capability is more suitable for this stage. On the contrary, the population concentrates on some parts of the solution space for exploitation when the value ofĒ g is small. The individual stage can be estimated to the exploitation stage, and the mutation strategy DE/best/1 with good exploitation capability can be employed to detect the optima.
(3) When the population is in a certain state (the local optimal solution region), the individuals are no longer transferred between states. In this case, the value of the information entropy is zero, which results in rapid convergence of the population.
3. Mutation strategy and control parameter selection where mutant individual v g i is generated according to the stage of target individual x g i . When x g i in exploration stage, the mutation strategy DE/rand/1 is employed, when x g i in exploitation stage, the mutation strategy DE/best/1 is employed.
x g best is the best individual in the population at generation g. F g i ∈ (0, 1] is scaling factor for x g i , rand 1 , rand 2 , and rand 3 are randomly chosen from [1, N P ], and they differ from i but also to each other.
A simple selection strategy for control parameters F and CR based on current stage division inspired by [8] is designed for comparison. An explorative individual will demand a high F and CR, whereas an exploitative will require the opposite. The F and CR values for each individual of gth generation are assigned as follows.

D. DEIE ALGORITHM DESCRIPTION
The DEIE algorithm is described as Algorithm 1. After initialization, the K Markov states are learned firstly through the learning process of Gen generations. In the following iteration process, the individuals in population have state transition caused by the search behaviour of the population.
In line with the temporal ordering of state transition of individuals at the two adjacent generations, the state transition Algorithm 1 Pseudocode of DEIE algorithm Require: population size (N P ), scaling factor (F ), crossover rate (CR), learning period (Gen). Ensure: Final population (P g ).
1: Initialization: generate initial population (g = 0, P g = {x g 1 , x g 2 , · · · , x g N P }), evaluate the function value of each individual in P g , and set the relevant parameters of DEIE algorithm ; 2: while the termination criterion is not satisfied do Determine Markov states of current population S = {S 1 , · · · , S t , · · · , S N P }, S t ∈ {1, · · · , K}; 8: Set initial T g = 0 andĒ g = 0; 9: for g > Gen do 10: for i = 1 to N P do 11: if rand(0, 1) <Ē g then 12: Generate mutant individual v g i via mutation strategy DE/rand/1; end for 20: Update T g andĒ g ; 21: g = g + 1 ; 22: end for 23: end while probability is calculated, and further the information entropy is calculated to observe the population dynamics. Based on the information entropy metric, the evolutionary stages of current individual can be estimated, then the suitable mutation strategy is selected for different stages.
Notably, the Markov state is formed at the end of the learning process of Gen generation, and the number of Markov states K and the representative center point are obtained, where the DE/rand/1 mutation strategy is employed in the learning process. Based on the K Markov states, the information entropy is calculated in the following iteration process.In addition, the infeasible solution is simply discarded and replaced with a new solution regenerated within the domain.

E. RUNTIME COMPLEXITY OF DEIE
Based on the above procedures, the runtime complexity of DEIE depends on the following analysis. In terms of Markov states, the runtime complexity O(N P · N P · D) comes from O(K · K) runtime is used to calculate the information entropy. Hence, the total runtime complexity of DEIE is O(max(N P · N P · D · Gen, N P · K · D · G max , N P · D · G max ), N P · G max , K · K · G max ). Given that the number of states K is less than the population size N P , the final runtime complexity is O(N P · K · D · G max ). G max is the maximum number of generations of whole algorithm. The original DE algorithm is O(N P ·D ·G max ). According to the study in [43][44][45], the runtime complexity of DEIE is relatively small compared with that of expensive function evaluations. Therefore, the proposed DEIE is accepted for the practical problems, especially for expensive-to-evaluate problems.

V. EXPERIMENTAL RESULTS
To evaluate the performance of DEIE, CEC 2013 [42], CEC 2014 [43], and CEC2017 sets [44] are utilized in the following diverse experiments. The experimental results are presented in four subsections.
In Section V-A, the performance of DEIE is evaluated against that of eleven top-ranked DE variants. Section V-B presents the experiment of component analysis. The parameter study is described in Section V-C. Section V-D provides the reallife application of DEIE. Moreover, just to show that the proposed DEIE works well on different test sets, 21 classical benchmark functions [45] are also used to test the performance compared with that of four state-of-the-art DE and three classical EAs. Some experiments about classic benchmark functions are shown in the supplementary file. All algorithms cease when the number of function evaluation (FEs) accumulates to exceed the maximum number of FES (MaxFEs), or the function error reaches the predefined accuracy within the given MaxFEs.
All algorithms cease when the number of function evaluation (FEs) accumulates to exceed the maximum number of FES (MaxFEs), or the function error reaches the predefined accuracy within the given MaxFEs. (f (x) − opti) represents the function error, where f (x) is expressed as the function value of solution x generated by the current algorithm and opti presents the global optimum.In the following experiments, the predefined accuracy value is 1.00E − 08. The results are averaged using 51 independent runs for each function of each algorithm. The parameters N P , F , and CR of DEIE are set to 50, 0.5, and 0.5, respectively. The number VOLUME 4, 2016  of Markov state K is determined automatically at the end of the learning process. And the iteration number of learning process Gen is set to 100. In addition, the parameters settings of other algorithms are identical to their original papers.

A. COMPARISON OF DEIE WITH TOP-RANKED DE VARIANTS
To evaluate the overall performance of DEIE on the CEC 2013 [42], CEC 2014 [43], and CEC2017 sets [44], several top-ranked DE variants is used as the competitor algorithms. In this experiment, the termination criterion are set to MaxFEs=10, 000 × D as suggested in [44]. First, DEIE is compared with four advanced DE variants on the CEC 2013 set, namely, SHADE [46], ZEPDE [9], IDE [5], and SinDE [47]. Table 1 reports the mean and Std values of 30-D functions. DEIE produces good results on 11 out of 28 functions compared to all competitors, performs significantly better on 14, 17, 13, and 16 out of 28 functions, and exhibits similar performance on 4, 3, 3, and 3 functions, respectively. SHADE, ZEPDE, IDE, and SinDE show remarkably better performance than DEIE on 10, 8, 12, and 9 functions, respectively. The last row of Table 1 gives the analysis of the optimization performance obtained by Wilcoxon s test. DEIE outperforms ZEPDE and IDE (pvalue < 0.05). SHADE, SinDE and DEIE performs at the same level of optimization, but DEIE achieved best 14 and 13 cases than SHADE and SinDE.
Second, DEIE is compared with other four advanced DE variants on the CEC 2014 set as shown in Table 2, namely, L-SHADE [48], MC-SHADE [49], iLSHADE [50], and ADDE [10]. DEIE obtains good results on 11 out of 30 functions compared to all competitors. The results reveal that DEIE may tend to perform better on unimodal and multimodal functions. DEIE significantly outperforms others on 14, 21, 12, and 16 functions, respectively. DEIE gets equally good performance on 8, 4, 8, and 7 functions, respectively, compared to other algorithms. However, L-SHADE, MC-SHADE, iLSHADE, and ADDE are obviously better than DEIE on 8, 8, 5, and 10 functions, respectively. Clearly, DEIE is significantly better than MC-SHADE. Although the significance test shows that DEIE has no significant advantage over L-SHADE, iLSHADE and ADDE, it can obtain the optimal results in 14, 21, 12 and 16 cases, respectively.

B. EFFECTS OF DEIE COMPONENTS
The effectiveness of the proposed DEIE maybe depends on the evolutionary stage division based on entropy, the basic mutation strategy DE/rand/1 and DE/best/1 are further used in exploration and exploitation stages, respectively. To discuss the validity of the proposed DEIE, experiments are conducted on benchmark CEC2017 set to identify the effect of each component. The results are presented in Table 4, where DEIE-rand and DEIE-best represent DEIE using only DE/rand/1 strategy and DEIE utilizing only DE/best/1 strategy, respectively. As shown in Table 4, DEIE significantly outperforms the other DEIE variants on the majority of functions.     Table S-III (supplementary  file). The function with the worse results, such as f 9 function, has an information entropy curve that is always greater than zero although it declines at the beginning. The curve of f 16 function indicates that the population continues to vacillate between at least two states. Thus, no optimal solution can be found.

C. PARAMETER STUDY
In the proposed DEIE, the parameters, i.e., Gen, K, N P , F and CR need to be discussed. The experiments of this part are conducted on 21 classic functions (see supplementary file Table S-I). 1) Learning period Gen and K Considering that the K Markov state is formed at the end of the learning process, the learning period Gen must be optimized to consider the learning effect of the Markov states and the overall optimization of the DEIE. Large Gen can ensure accurate learning of the number of Markov states but will consume computational resources when the algorithm explores the optimal solution. Conversely, small Gen will result in coarse division. First, it is investigated that the relationship between the maximum learning period Gen and the number of Markov states K. Fig.4 shows the phenomenon that the number of Markov states K decreases as the iteration progresses, whereas K for most functions becomes stable after 100 iterations. To avoid the possibility of diminishing the optimization effect caused by large number of iterations, the maximum learning period Gen is set to a fixed number of 100. To further reveal the impact of Gen, the effect of the number of Markov states K on the optimization results is investigated, where K varies from 0 to 20 under Gen = 100 according to Fig.4. And the other parameters are identical to those used in Section V-A.  Table 5 reports the influence of different values of K against the performance of DEIE. It reveals that no significant differences exist under different values of K in DEIE. Therefore, Gen = 100 is a suitable choice.
2) Population Size N P In order to investigate the impact of N P , six frequently used settings, i.e., 30,40,50,60,80, and 100, are employed in DEIE. The rest parameter settings are the same as that described at the start of Section V. Table S-X (see supplementary file) presents the mean and Std values under different NP on 30-D classical benchmark. Clearly, DEIE with N P = 50 obtains better performance compared to DEIE using other NP settings. Furthermore, the Friedman rankings are summarized in Table 6. It indicates that N P = 50 achieves the best results on 12 out of 21 functions, and obtains the best ranking. In this way, N P = 50 is more suitable for DEIE.
3) Scaling factor F and crossover rate CR The initial values for F and CR are set to 0.5, and 0.5 is reassigned to F and CR when the range of them are exceeded. The rest parameter settings are the same as DEIE. The results of Wilcoxon's test obtained by DEIE_fixed (without adaptive parameters) and DEIE on 30-D classical benchmark are shown in Table 7. Clearly, DEIE with adaptive parameters obtains better performance on 14 functions compared to DEIE. The mean and Std values of DEIE_fixed are shown in Table S-XI (see supplementary file).From these data, we can conclude that parameter selection based on current stage division can improve the performance of DEIE. Therefore, parameter adaptive mechanism may be another promising direction for us.

D. REAL-WORLD APPLICATION
In this part, DEIE is utilized to solve a real-life problem, namely, the protein structure prediction (PSP) problem that is essential in bioinformatics. Proteins are an important component of all cells and tissues in the human body, and their function is directly determined by their three-dimensional (3D) native structure. For example, the protein 4UEX is a structure of human saposin which is an important auxiliary factor for acid hydrolytic enzyme to degrade complex glycosphingolipids. Serious metabolic diseases may be generated by deficiencies of saposin or hydrolytic enzyme [56]. High-throughput, high-precision protein structure predicting technology will strongly promote the development of life science, greatly accelerate the development of cancer, viral antibiotics, targeted drugs and new proteases. The protein folding thermodynamic hypothesis point out that its native structure is the conformation with minimal potential energy. The features of PSP problem are inaccurate energy function and expensive function evaluation cost. The state-of-the-art methods in this field, such as Rosetta [57] and I-TASSER [58], employ multistage Monte Carlo and its variant algorithms with fixed computational cost for each stage. There are some problems need to be considered: (1) For small proteins, less computational cost possibly could be sufficient to generate native-like protein conformations. Meanwhile, a higher computational cost is needed for large proteins due to vast conformation space. Fixed cost may lead to waste of small protein exploration, but insufficient exploration of large protein. (2) Without an effective strategy, Monte Carlo adopted widely in this field is prone to premature convergence. The proposed DEIE is effective for tackling PSP problem: (a) the dynamic stage division technique is effective and adaptive for different length of proteins; (b) entropy, describing disorder or uncertainty of a system, may be more applicable for solving imprecise function models from macroscopic perspectives; (c) DEIE has good scalability, and the mutation strategy can be flexibly adjusted according to different proteins.
A coarse-grained representation of the protein structure used in Rosetta [59] is adopted in DEIE as shown in Fig.5. Therefore, the dihedral Angles of residues are used to encode the proteins. According to the latest prevalent trends, complex energy constructed from physicochemical knowledge and spatial geometry knowledge is used to guide protein folding, which is expressed as follows: where f represent the energy function combined by protein physicochemical model and knowledge model. E Rosetta is Rosetta score3 physicochemical energy. E distance is geometric model based on residue distance. Detailed description can be found in [57]. On the basis of stage division, the implementation of operators in DEIE can be redesigned flexibly according to current application scenarios. (1) Initialization. The initial population is generated through random dihedral angle perturbation. (2) Mutation operation. PSP-specific versions of DE/rand/1 and DE/best/1 mutation strategy are designed to accommodate different stages, the details are shown in the Supplementary materials. (3) Crossover operation. The trial conformation is generated by exchange residue information between the target and mutate conformation. (4) Selection operation. Following the Metropolis criterion, the satisfying offsprings can be survived into next generation.
In this experiment, DEIE is compared with four algorithms over 50 nonredundant proteins with various lengths of the amino acid sequence. Two current representative predicting methods are Rosetta-d (a distance-assisted fragment assembly method) and L-BFGSfold (a distance geometry optimization method). Two DEIE variants are DEIE_v1 (DEIE with only PSP-specific versions of DE/rand/1 mutation strategy) and DEIE_v2 (DEIE with only PSP-specific versions of DE/best/1 mutation strategy ). Detailed description of experiment setting and compared methods can be found in supplementary materials.
Comparison of predicted results generated by Rosetta-d, LBFGSfold, DEIE_v1, DEIE_v2 and DEIE are listed in Table 8 on all 50 benchmark proteins, and the detailed results of each protein are presented in Table S-XII of Supplementary materials. Two well-known structural quality measures are used in assessing the similarity of the predicted conformation and a reference conformation, generally the native structure. One is the root mean square deviation (RMSD), where smaller RMSD means the smaller deviation and the better model accuracy. The other one is the template modeling score (TMscore), which ranges in [0,1] and higher value reflects better folding accuracy. Meanwhile, a TMscore ≥ 0.5 represents correctly folded models. Compared with two stateof-the-art methods, the average RMSD of DEIE (4.47Å) is reduced by 24.63% compared to Rosetta-d (5.93Å) and 9.74% compared to LBFGSfold (4.95Å). The average TMscore by DEIE (0.696) is 23.45% higher than that of Rosettad (0.564) and 3.83% higher than that of LBFGSfold (0.670). Compared with two DEIE variants, the average RMSD of DEIE (4.47Å) is lower than that of DEIE_v1 (11.20Å) and DEIE_v2 (5.43Å), the average TM-score of DEIE (0.696) is higher than that of DEIE_v1(0.360) and DEIE_v2(0.662). All the results in Table 8 show that the prediction accuracy of DEIE is significantly better than each of compared methods (with P-values of <0.05). The adaptive switching mechanism and the cooperation of these two strategies are effective.   RMSD on 31 of 50 proteins, accounting for 62%, and a higher TM-score on 42 of 50 proteins, accounting for 84%. Fig.7 shows one case which the superimpositions of two target model predicted by all algorithms and the experimental structure. On protein 1F1E_A, the native structure, Rosettad, LBFGSfold, DEIE_v1, DEIE_v2 and DEIE structure are marked in cyan, pink, blue, orange, and green, respectively. The accuracy of prediction is represented by the similarity between the predicted and natural structures. As shown in the Fig.7, DEIE achieves more accurate results than others.

VI. CONCLUSION
This paper presents a DEIE algorithm that balances the exploitation and exploration ability of DE by using suitable mutation strategy in different stages. In DEIE, the information entropy metric is proposed to determine the current stage of evolution after combining the number of Markov states with the transition matrix of Markov state model. The information entropy metric uses the historical evolutionary information across generations, which reveal the trend of the movement of individuals in the search space. It is reasonable to estimate the extent that the population explores the solution space and then divided evolutionary process into two stages. Consequently, the stage-specific mutation strategy is allocated to the population individuals by using the information entropy. Experimental results described in Section V verify that DEIE performs better than or at least competitive with diverse state-of-the-art DE variants on CEC2013, 2014, and 2017 benchmark sets. Moreover, DEIE also achieves the promising performance on real-world PSP problem. The sensitivity of DEIE to parameters is also studied. The main work of the manuscript focuses on dynamic division of the evolutionary stages based on entropy and stage-specific mutation strategy adaptation. To avoid confusion about the effectiveness of the dynamic stage division, only the simple basic mutation strategies are adopted to highlight main ideas of the manuscript. This may be the reason why DEIE cannot perform so well on some benchmark functions compared to some top-ranked algorithms. The new optimized mutation strategies based on stage division and performance optimization based on benchmark function may be another promising direction for us, which may enhance the performance of the algorithm. In addition, DEIE is more suitable for large and complex PSP practical application because it maybe tackles the multistage problems of general concern in this field.