Correlations Between the Scaling Factor and Fitness Values in Differential Evolution

Designing fitness-based adaptive scaling factor (<inline-formula> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula>) is an effective method to enhance the performance of differential evolution (DE) algorithms. This paper investigates the correlations between <inline-formula> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula> and fitness values of target vectors, base vectors and difference vectors. The correlations are described by the notations of monotonicity and nonlinearity. Monotonicity is used to examine whether the optimization performance of DE and the fitness values of certain vectors have positive or negative correlation. Nonlinearity denotes the operation in which nonlinear mappings are used to redistribute the values of <inline-formula> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula> in [0, 1] so as to boost the optimization performance. These two aspects of correlations are empirically tested on the Numerical Optimization Competition benchmark functions in IEEE Congress on Evolutionary Computation. Simulation results reveal different qualitative and quantitative correlations between <inline-formula> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula> and fitness values of different vectors. Then, a new <inline-formula> <tex-math notation="LaTeX">$F$ </tex-math></inline-formula> that combines these relations is designed. Its strength is numerically verified by testing different CEC Benchmark functions.


I. INTRODUCTION
The differential evolution (DE) algorithm invented by Storn and Price [1] is a powerful population-based global searching tool. DE is believed to be effective for problems involving nonlinear and non-differentiable functions [2]. The number of DE research articles indexed in Science Citation Index database (via Web of Science) during 2007 to 2015 was 8714, as indicated in Reference [3], [4]. DE has been successfully implemented in diverse areas, such as spacecraft trajectory design [5] and statistical fisheries model estimation [6]. The major applications were also summarized in Ref. [3], [4]. DE employs the difference between distinct members from the current population as a guidance to search for a better solution. Compared with other intelligent algorithms, DE has the merits of few control parameters, good optimization performance and low space complexity [3]. Nevertheless, as an evolutionary algorithm, DE needs to compute large numbers of fitness functions to obtain the global optimum. To improve The associate editor coordinating the review of this manuscript and approving it for publication was Wei Wei . the performance, different types of enhanced DE have been developed in recent years. The comprehensive surveys of DE can be found in [3], [4], [7]. Among various techniques to improve DE, choosing good values of the scaling factor (F) in each generation is commonly an efficient option. Generally, the mechanisms to design F can be categorized into four groups: fixed value, random value, history-based adaption and fitness-based adaption. Fixed value indicates that F remains constant during the whole optimization. Storn and Price [1] indicated that F is not difficult to choose for good results. In their opinion, 0.5 can be a good initial choice of F. After testing different parameter settings for DE on the Sphere, Rosenbrock's and Rastrigin's functions, Gämperle et al. [8] found that the global searching ability and the convergence are very sensitive to the value of F. They suggested 0.6 as the initial choice. In another paper, Rönkkönen et al. [9] stated that setting F = 0.9 can balance well between the speed and probability of convergence. The benefits of fixed value lies in its simplicity. For complex problems such as the multimodal optimization problem [10] and problems with constrained experimental domain [11], the fixed value has been successfully employed. Alternatively, the value of F can be updated by random functions, namely, random value type. Das et al. [12] proposed the DE with Random Scale Factor (DERSF) and the DE with Time Varying Scale Factor (DETVSF). DERSF allows for stochastically scaling difference vectors, and thus, can help to retain population diversity. With DETVSF, individuals are encouraged to sample diverse zones of the search space during the early stages of the search. In the late stage, they tend to exploit the interior of a relatively small space in which the suspected global optimum lies. In the SaDE algorithm [13], F is varied by a normal distribution with a mean value of 0.5 and a standard deviation of 0.3. By doing so, SaDE attempts to maintain both exploration capability (with large F values) and exploitation capability (with small F values). Compared with fixed value, randomization can produce more values of F. Thus, it can enhance the performance to some extent. In some modified DE, such as TLBSaDE [14] and MDE [15], random values of F were used. The third type, namely history-based adaption technique, adaptively computes F by learning from the past generations of successes. It is widely applied in adaptive and self-adaptive DE algorithms, such as JADE [16], jDE [17] and SHADE [18]. In recent years, many SHADE-based algorithms [19] have been proposed; and some have performed well in testing different IEEE CEC benchmark functions. However, the mechanism to design the scaling factor in these improved DE algorithms is similar to that in SHADE. Based on the analysis in [20], an ensemble sinusoidal approach to automatically adapt the values of F was designed in LSHADE-EpSin [21]. It is believed that the performance of LSHADE-EpSin is better than that of SHADE. EsDEr-NR [22] is an enhanced version of LSHADE-EpSin. The last type of scaling-factor designing technique is fitness-based adaption, in which F is usually determined by fitness values from the current population. The first research concerning fitness-based adaption was by Ali and Törn [23]. It employed the minimum and maximum fitness values of current generation to calculate F. Ghosh et al. [24] developed a new fitness-based technique considering the fitness difference between the target vector and the best vector. Based on the idea that F for individuals with higher fitness values are larger, Tang et al. [25] designed the rank-based scheme and value-based scheme. In 2017, Mohamed introduced the triangular mutation scheme. In that paper, the adaptive scheme of F also takes into account both the minimum and maximum fitness values in the current generation [26].
As can be seen from the aforementioned reviews of F-designing techniques, history-based and fitness-based adaptive F are favorable in practice. However, compared to the abundance in history-based adaptive strategies, few researches have addressed fitness-based adaptive schemes. Besides, most existing fitness-based methods mainly focus on the minimum and maximum fitness values in each generation [23]- [26]. So far, the fitness values of more vectors have been largely ignored, which we believe has encoded important information about general structure of the fitness function. Thus, each individual's fitness should be exploited so as to obtain good values of F. In order to understand the scaling factor from a fitness-based perspective, this paper comprehensively studies the correlations between F and fitness values.
To that end, we propose a way to define the fitness-based correlation. The correlation addresses fitness values of the target vector, the base vector and the difference vector. Then, qualitative and quantitative relations are found by testing on the IEEE CEC 2014 problems. To show the potential of these correlations, a new F that combines the correlations is designed. The performance of the new F are verified on IEEE CEC 2014 and 2017 problems. Several classic and recent F-designing techniques are employed as a comparison.
The remainder of this paper is organized as follows: Section II reviews the classical DE and improved DE. Section III details the method to establish the correlations between F and different fitness values. Section IV discusses the correlations based on the numerical experiments. Section V concludes the whole paper.

II. CLASSICAL DE AND IMPROVED DE A. CLASSICAL DE ALGORITHM
In this subsection, the classical DE algorithm [1] is briefly reviewed. In the rest of the paper, it is assumed that minimization problems are to be resolved. There are four basic steps in the classical DE: initialization, mutation, crossover and selection.

1) INITIALIZATION
The population of DE is represented as where D is the dimension of variables and NP denotes the population size. The minimum and maximum of x are defined as Then, a common method to initialize the i-th individual x i,0 is where unif(0, 1) is a uniformly distributed random variable within the range of [0, 1].

2) MUTATION
Let x i,G be an individual at generation G. After initialization, a donor vector v i,G with respect to the target vector x i,G is produced by the following mutation operator: where the indices r i 1 , r i 2 and r i 3 are randomly generated mutually exclusive integers within the range of [1,NP] and are all different from index i. These indices are randomly generated once for each donor vector. Here, x r i 1 ,G is termed the base vector, and x r i 2 ,G − x r i 3 ,G is called the difference vector. F is the scaling factor and is usually constrained in the range of [0, 1].

3) CROSSOVER
The purpose of the crossover operator is to produce a trial vector u i,G by combining x i,G and v i,G . Let u j i,G be the j-th components of u i,G . The following rule is applied elementwisely: where j rand is a randomly chosen integer in the range of [1, D], r i,j = unif(0, 1), and Cr is the crossover rate and defined in the range of [0, 1].

4) SELECTION
Once u i,G is generated, the fitness values of u i,G and x i,G are calculated and compared. The vector that survives to the next generation is selected by the following rule: where f u i,G and f x i,G represent the fitness values of u i,G and x i,G . The framework of the classical DE is shown in Algorithm 1.
Generate Cr i,G 12: Generally, the techniques to improve DE can be categorized into four aspects, by respectively or combinedly designing the following: mutation operator, population size NP, scaling factor F, and crossover rate Cr. To classify the different variants of mutation operators, the notation ''DE/x/y/z'' is introduced. Here, x represents the base vector to be perturbed, y donates the number of difference vectors, and z stands for the type of crossover. Two types of crossover have been considered, which are exponential (exp) and binomial (bin). The binomial type is mostly used and the ''DE/x/y/z'' notation is usually shortened as ''DE/x/y''. In the first paper of DE [1], the classical DE can be noted as DE/rand/1. Then, DE/best/1, DE/rand/2, DE/best/2 [27] and DE/currentto-pbest/1 [16] were proposed and widely used in other improved DE. More complicated mutation operators were designed in [26], [28]- [30]. It should be noted that no matter which mutation operator is chosen for DE, designing the values of NP, F and Cr are always necessary. In 2006, Teo [31] firstly demonstrated the feasibility of self-adapting the population size parameter in DE. L-SHADE [32] showed the powerful performance improvement and ranked as the best algorithm in IEEE CEC 2014 problems. In that paper, the linear population size reduction technique was used. Poláková et al. [33] improved the population reduction technique and enables to decrease or increase the population size during the search. In EsDEr-NR [22], the niching-based population reduction method was employed to determine the number of population in each generation. As for F and Cr, many significant developments have also been performed, such as the invention of jDE [17], JADE [16], and DE-RCO [34]. Table 1 lists six improved DE algorithms. Here ''1'' indicates that one of the aforementioned features has been elaborately designed, ''0'' refers to retaining the original setting; ''−'' means that the parameter is not mainly designed, and the bold ''1'' donates the primary concerned parameter in the paper. As can be seen, large numbers of modern DE algorithms have complex strategies to mutually tuning all these factors. However, recent work appears to be concentrating on studying only one parameter. One one hand, focusing on one parameter can help to provide insights into understanding DE algorithms. On the other hand, new advances in tuning a single parameter can be plugged into existed DE algorithms to further improve performance, as is evident in [29] and [34]. Note that F is the unique feature of DE algorithms, as compared with NP and Cr. In fact, F is strongly bound to mutation operators. For example, In (4), F is used to scale the difference vector. In JADE [16], F is responsible for two difference vectors. As a first step to understand the fitness-adaptive DE, we choose mutation operator as in (4). Based on (4), the correlations between F and fitness values of different vectors will be discussed.

III. CORRELATIONS BETWEEN F AND FITNESS VALUES A. ARCHITECTURE OF THE CORRELATIONS
Note from Section II-A that in a DE algorithm, the trail vector is the vector that is to be evaluated and compared. As given by the diagram in (7), the contribution of F on the trail vector is mainly through the donor vector. It may be at first glance that the donor vector and the target vector are independent with each other. However, donor vector is weakly coupled with the target vector since the base vector and the difference vector are generated deliberately different from the the target vector. Thus, in this work it is reasonable to assume that F is related to the fitness values of target vectors, base vectors and difference vectors. In the later context, these vectors will be referred as tested vectors. (7) Let f (·) denote the fitness value of ''·''. To evaluate the correlations between F and fitness values of these vectors, the functional form of F is written as

base vector difference vector
where r i , r i 1 , r i 2 , r i 3 refer to the indices in classical DE, g T , g B , g D represent the contributions of target vector, base vector and difference vector, respectively. In the following the functions g T , g B , g D are to be resolved.
To that end, several functional forms of g (·) functions are considered to computed F. The computed F are then plugged into existing DE algorithms to run a large number of numerical experiments. The performance of each g (·) function is recorded and then compared. Good functional forms of F can thus be identified. Then, correlations between F and fitness values of tested vectors are obtained.
In this paper, the following two features of g (·) functions are mainly considered: monotonicity and nonlinearity. Here ''monotonicity'' refers to the comparison between optimization performance improvement/deterioration with F computed from two reversely-designed formulae. For example, it is termed as ''positive correlation'' if F with g(f (v)) performs better than that with 1 − g(f (v)), where g(·) is a monotonically increasing function of v and v is a tested vector. On the other hand, ''nonlinearity'' denotes the operation that redistributes values of F in [0, 1]. In this work we use modified power functions (see later in (15)) to account for the nonlinearity.

B. MONOTONICITY
In the following, g T , g B , g D are designed elaborately. For g T , F can be designed from the perspective that how much relative deficiency of the target vector's fitness as compared with the current best. It is denoted as ''proportion type'' and can be written as where f min,G and f max,G are the minimum and maximum of fitness values of generation G.
On the contrary, F can be understood as the improvement of the target vector's fitness by comparing with the current worst. It is termed as ''reverse proportion'' and it is written as Note that F p,t + F rp,t = 1.
Similarly, for g B , the proportion-type of F is and the reverse-proportion type of F is For g D , the proportion-type of F is and reverse-proportion type of F is Here F p,d and F rp,d can be understood as the local relative roughness/smoonthness of the fitness function, thus providing an intuitive way to characterize the local structure.

C. NONLINEARITY
The better F in Section III-B, denoted by F 0 is used as a benchmark of performance. Then, the following expressions of F 0 are searched to improve the performance. The exact expressions are   Fig. 2 presents the new distribution of F after applying (15). As can be seen from Fig. 2, these maps have distinct behaviors. For power 1/2 and power 2, the new F spans the whole [0, 1]. However, the power-2-map tends to concentrate the value of F into smaller values whereas the power-1/2-map prefers larger values. Similarly, the power-1/3-map and the power-3-map have two reverse behaviors. The power-1/3-map produces two peaks around the left end and right end regions, whereas a condense F around 0.5 can be seen from the power-3-map. Thus, (15) can be used to represent most of the change processes from F 0 to F.

IV. NUMERICAL EXPERIMENTS AND RESULTS
In this section, numerical experiments are designed to test the performance of different F. Inspired by the machine learning method, a ''training set'' is employed to determine the exact (proportion/reverse-proportion) types and the best powers of F for the tested vectors. Then a ''test set'' is used to validate the performance of these correlations. For the training set, ''Real-Parameter Single Objective Optimization'' of IEEE CEC 2014 (hereafter IEEE CEC 2014 problems) [35] are adopted. IEEE CEC 2014 problems have 30 benchmark functions: 3 unimodal functions, 13 simple multimodal functions, 6 hybrid functions and 8 composition functions. Both unimodal/multimodal and separable/non-separable problems are included. Many of the functions have large numbers of local optimums. In some cases, such as function 11 and 12, the second better local optimum is far from the global optimum.
We reversely design two relations to find out the better monotonicity formulation for each tested vector. For example, (9) and (10) on target vector. These two relations, denoted by proportional and reverse-proportional formula, are used to solve the IEEE CEC 2014 problems. The one that performs better is chosen as the representative monotonic relation.
For the test set, the benchmark functions (without function 2) in ''Real-Parameter Single Objective Optimization'' of IEEE CEC 2017 (hereafter IEEE CEC 2017 problems) [36] are considered. In IEEE CEC 2017 problems, there are also 30 benchmark functions: 3 unimodal functions, 7 simple multimodal functions, 10 hybrid functions and 10 composition functions.

A. MONOTONICITY AND NONLINEARITY OF A SINGLE VECTOR
Before solving an optimization problem, Cr and NP need to be designed. According to the paper of [13], Cr is usually sensitive to problems with different characteristics. Thus, the mechanism of determine the value Cr should be designed carefully. In this section, the method to design Cr is designed to be the same as that in EsDE r -NR [22]. As for NP, two cases are concerned: (1) adaptive NP, the same as that in EsDE r -NR [22]; (2) Fixed NP, NP = 5D [8], D is the dimension of the problem.

1) ADAPTIVE NP
First, the adaptive NP is used. Table 2 compares (9) and (10) in testing the 10-D version of IEEE CEC 2014 problems. Here the benchmark is set as (10). In order to analyze the solution quality from a statistical point of view, the results are compared using the Wilcoxon's ranksum test with a significance level of 0.05 [37]. For each F and each benchmark function, the tests are run 51 independently. The mean and standard deviation of the errors for these runs are recorded in the table. The best solutions with the smallest error mean values for each function are marked in boldface font. After comparison, one of three signs (+, −, =) is assigned. ''+'' means that (9) performs significantly better than (10); ''−'' means that (9) performs worse than (10); When the two F have no obvious performance difference, their relation is represented as ''=''. Learning from the table, (10) performs better in 6 functions and worse in 4 functions. Thus, (10) is slightly better than (9).

a: TARGET VECTOR
Furthermore, the performance of mapping F in (10) via (15) is tested. Table 7 (See Appendix) records the detailed optimization results and the comparison results for each function. Figure 3 depicts the total numbers of ''+'', ''−'' and ''='' for different powers. Here the benchmark is set as power 3 of (10). It can be found that the 3 power of (10) outperforms 1/2, 2 and 1/3 to a large extent and is slightly better than 1. Thus, 3 is regarded as the best power of (10).  Table 3 shows the comparison results of (11) and (12). Here the benchmark is set as (11). Compared to (12), (11) is better in 13 functions but only worse in 6 functions. Intuitively, if the fitness value of the base vector is low, a smaller F is better because the donor vector can inherit more from the base vector. In other words, it prefers the exploitation operator. Then, the performances of different powers of (11) are compared. The optimization results and the comparison results for each function are given in Table 8 (See Appendix). Fig. 4 is the total numbers of ''+'', ''−'' and ''='' for different powers. Here the benchmark is set as power 3. Among the 5 types of powers, 1 and 3 perform the best. Considering that 3 finds more of the best solutions, 3 is chosen as the best power of (11).  Table 4 shows the comparison results for (13) and (14). Here the benchmark is set as (14). Equation (14) is better than (13) in 19 functions and worse in only 4 functions. Intuitively, when the fitness of difference vectors is small, F should be large enough to have a substantial perturbation on the base vector. In other words, it encourages exploration during searching. Then, based on (14), 5 different powers are tested, and the results are recorded in Table 9 (See Appendix). The total numbers of ''+'', ''−'' and ''='' for different powers  are shown in Fig. 5. Here the benchmark is set as power 3. Obviously, the power of 3 outperforms the others.

d: DISCUSSIONS OF MONOTONICITY AND NONLINEARITY
Learning from experiments above, it is found that the correlations between F and fitness values of different vectors are different. Firstly, the number of equivalent results (tie) of (9)/(10), (11)/(12) and (13)/ (14) with power-1 are 20, 11 and 7. It can be understood that the larger the number of ties, the less sensitive F is to the fitness of this vector. Namely, F is most sensitive to the fitness values of difference vectors. Secondly, the number of good and bad results of (10)/(9), (11)/ (12) and (14)/(13) with power-1 are 6/4, 13/6 and 19/4. Thus, F is in proportion to the fitness values of base vectors whereas having the opposite dependence on that of target vectors and difference vectors. It is interesting to note that the sensitivities are different. For example, the ratio 6/4 again indicates that a good F is weakly dependent on the target vector, since (9) and (10) have roughly the same trend of improvement. Thus, the analysis above provides insights of how DE works in a quantitative manner, which can guide the designing of F.
Thirdly, from the analysis of nonlinearity, it can be found that the best power of these vectors are all 3. Figure 2 reveals that the power-3-map tends to produce a condense F around 0.5. In the previous studies of [1] and [8], 0.5 and 0.6 are suggested to be initial value of F. Thus, power-3 is consistent with the conclusions in [1] and [8].

2) FIXED NP
In order to study the effect of NP on the correlations, in this subsection, the population size is fixed, namely, NP = 5D. Another round of comparison reveals that the correlations remain the same. The results are recorded in Table 5.

B. COMBINATION OF THESE CORRELATIONS
In fact, the previously discovered correlations can be combined to obtain a novel way of designing F. As an illustrative example, in this section we choose a simple combination, namely, the average of the power-3 formulae of (10), (11) and (14): Similar as the treatment in Section III-C, we first evaluate the distribution ofF. Without loss of generality, we assume f (x) ∈ [0, 1]. We uniformly take 1000 samples of f (x); Fig. 6 presents the distribution ofF. Different from Section III-C, hereF is computed intermediately via F p,t , F rp,b and F rp,d , which are all directly computed by fitness functions. After being mapped by (15), values of (10), (11) and (14) are mostly concentrated around 0.5, of which the shape appears a normal distribution. Next, we consider applying a composite nonlinear function on (16). Although previously it is empirically revealed that the power-3 map in (15) is a good option, it is inappropriate to be directly used here since the distribution ofF is far from uniform distribution. On the contrary, we adopt a new strategy via a simple translation: where u =F + F, F = 0, ±0.1, ±0.2, ±0.3 and The performance of (17) under each F is compared in solving the 10-D version IEEE CEC 2014 problems. Here the benchmark is set as F = 0. Figure 7 shows that negative F can achieve better results; and F = −0.2 is the best. Inspired by this translation behavior, the final F takes the following form: (19) This form comes from the observation that squaring a randomized variable in [0, 1] decreases the expectation, which is similar to the translation behavior with F < 0. In addition, (19) increases the nonlinearity of (16). Though (19) appears to be complicated, it only involves algebraic calculation of known fitness values. It is worthy to mention that these fitness values of the tested vectors have already been calculated in the stage of selection, namely, the fourth step in the classical DE, or Line 14 in Algorithm 1. Thus, (19) does not add extra burden on computer resources.
In fact, numerical test shows that it only takes about 3 milliseconds to produce all the F values (1800 in total) in each generation(CPU: 3.60 GHz, RAM: 8 GB).

1) TESTING F IN (19) ON IEEE CEC 2014 PROBLEMS
The performance of (19) is tested on IEEE CEC 2014 problems. As mentioned in previous context, F is strongly associated with mutation operators. Since the current work is built onto the classic mutation operator, numerical comparisons are mainly limited into this type. Comparisons with other DE variants and other metanephritic algorithms are beyond the current scope. The adaptive Cr and NP are used. In order to comprehensively evaluate the strength, 7 types of F in literatures are used, which are: (1) 0.1 and 0.9 [9] are the fixed values; (2) rand [12] is the random value, and can be described as (3) SinDE and SHADE are the history-based adaption; SinDE is designed to be the same as that used in [22] and is used during the whole generations, SHADE represents the method to design F in the SHADE algorithm; (4) FiADE [24] and Rbs [25] are the fitness-based adaption. FiADE refers to the following equation: where f i = |f (x i ) − f (x best )| and λ = f i /10 + 10 −14 .
Rbs is short for rank-based scheme:  Figure 8 and Figure 9 show the total numbers of ''+'', ''−'' and ''='' for different schemes of F. Here, the benchmark is set as the proposed F in (19). Compared with 0.1, 0.9, rand, SinDE, FiADE and Rbs, the performance of the proposed F is much better. If compared with SHADE, the proposed F seems to be slightly better. For 30-dimension problems, the proposed method has an 8−8 tie with SHADE. When the dimension is set to 50, the proposed method beats SHADE to a larger extent, namely, performing better in 13 functions whereas getting worse in 3 functions.    (17) are the worst in this simulation. However, its performance catches up its peers very quickly; it is among the best only after first 100 generations, as can been from Fig. 10 (f). The strength of (17) can be better revealed by checking other functions. For f 1 and f 3 , the exploration capability of (17) is demonstrated.  (19) and F in literatures using CEC2017 problems for D= 10, 30, 50 and 100, benchmark: (19).
The fitness value decreases rapidly during early generations. The exploitation capability of (17) can be seen by examining f 11 , f 12 and f 22 . Though the error fitness values with (17) in the early stage is not the best, its strength is evident after the generation exceeds 1000.  Table 6. The benchmark is set as (19). R + represents the sum of ranks for the test problems in which the aforementioned previous F performs better than (19); R − represents the sum of ranks for the test problems in which the aforementioned previous F performs worse than (19). In this subsection, the technique of designing F that used in the original EsDE r -NR algorithm is also employed as a comparison. Setting the significance level to be 0.05, it can be found that the performance of most previous F are no better than that of (19). Specifically, In most of the dimensions, (19) performs better than 0.1, 0.9, rand, FiADE and Rbs. For SinDE, it is competitive to (19) if the dimension is low. However, when the dimension is high, such as 50 and 100, SinDE can no longer catch up with (19). For SHADE and EsDE r -NR, their performances are similar to that of (19). If the dimension is 100, SHADE becomes better.

2) TESTING F IN (19) ON IEEE CEC 2017 PROBLEMS
It should be noted that the combination method of the correlations in (16) is preliminary. More advanced combination strategies may lead to better optimization performance. Studying different combination strategies of F with fitness values of different vectors are beyond the scope of this work. However, the primary purpose of this subsection is to demonstrate the potentials of these relations. Learning from the comparison results above, it can be concluded that the performance of the proposed F in (19) with the relations is competitive.
Limited by the classical mutation operator in (4) used in this work, the performance of the new F in (19) is occasionally less powerful than the original EsDE r -NR [22]. Nevertheless, this paper provides a new perspective to design fitness-based F. For complicated mutation operators such as in [22], F usually controls the scale of more than one difference vector. Predictably, the correlations between F and the vectors are more complicated. Fox example, F is likely to be related not only to the fitness values but also to the angles of these difference vectors. The detailed discussions about these correlations are beyond the research of the paper. However, the simulation results in this section already indicate the existence of correlations between F and the fitness values of many vectors. Thus, it is beneficial to exploit this phenomenon. As for specified results for complex mutation operators, it will be the focus of future work.

V. CONCLUSION
This paper presents a novel method to investigate the correlations between F and fitness values of target vectors, base vectors and difference vectors in classical DE. By testing on the single-objective-optimization problems in IEEE CEC 2014 Competitions, the qualitative and quantitative dependency is obtained. It is found that F is in proportion to the fitness values of base vectors whereas it has the opposite dependence on that of target vectors and difference vectors. Compared with target vectors, F is more sensitive to the fitness values of the base vector and difference vector. To verify the potential of these correlations, a new F is designed that comprehensively combines these relations. The expression involves a second order power function of the arithmetic mean. Simulation results show that the proposed F outperforms most current schemes of F.
This work provides a new way to tune fitness-based adaptive parameters. It can be extended to design F for newlydeveloped mutation operators, or to design Cr for general metaheuristic algorithms. More advanced combination of fitness-based F-scheme, or correlation between scaling factor and recent mutation operators, may also be future research directions. Tables 7-15.