A K-Means Clustering-Based Hybrid Offspring Generation Mechanism in Evolutionary Multi-Objective Optimization

Model-based recombination operators ignore individual quality information, while genetic-based differential evolution (DE) operators lack the extraction and use of global information. This makes it impossible for the single offspring generation method to always achieve excellent performance on various optimization problems. In order to solve the above problems, the K-means clustering-based hybrid offspring generation mechanism multi-objective evolutionary algorithm (KMDEA) is proposed. KMDEA performs K-means clustering on the population, and builds a multivariate Gaussian model based on the clustering results to discover the global information (the regularity property) of the population. For realizing the fusion of global and individual information, this paper designs a new hybrid offspring generation mechanism (KMD mechanism) to extract and use local individual information. Compared with a variety of mainstream multi-objective evolutionary algorithms (MOEAs), the results show that KMDEA has obvious advantages in solving multi-objective optimization problems (MOPs) with complex characteristics.


I. INTRODUCTION
This paper considers the following continuous multiobjective optimization problems (MOPs) [28]: Among them, x = (x 1 , . . . x n ) T is n dimensional decision variable, and is the space of decision (variable). The decision vector x is mapped from the decision space to the target space by the mapping function F : → R m , and the target vector y of the m dimension is obtained. m represents m optimization goals.
There are a large number of optimization problems in scientific research and production applications, such as industrial design, job scheduling, and resource allocation. With regard to MOPs, since Schaffer [29] first applied an evolutionary algorithm to solve MOPs, the studies on multiobjective evolutionary algorithms (MOEAs) have been a hot The associate editor coordinating the review of this manuscript and approving it for publication was Victor S. Sheng. topic in the field of optimization [30]. Furthermore, MOEAs can conveniently solve the complex optimization problems, which are difficult to be solved by traditional optimization methods.
According to the basic ideals adopted by MOEAs, MOEAs can be roughly divided into the following three categories [2]: (a) MOEAs based on Pareto dominance, such as the non-dominated sorting genetic Algorithm 2 (NSGA-II) [24] and improved strength Pareto evolutionary algorithm (SPEA2) [23]; (b) MOEAs based on evolutionary performance metrics, such as the indicator-based evolutionary algorithm (IBEA) [31] and S-metric selection evolutionary multi-objective optimization algorithm (SMS-EMOA) [17]; (c) MOEAs based on decomposition (MOEA/D), MOEA/D [32] is different from the above two kinds of algorithms, as it is not only a specific algorithm, but also has a general algorithm framework that can be incorporated into evolutionary strategies [33].
Offspring recombination and environment selection are the two main operations of MOEAs. Offspring recombination aims to generate new individuals, and environment selection determines which trial individuals can survive. These two operations are equally important in improving the performance of the algorithm. However, the existing MOEAs mainly focus on the design and analysis of environment selection operators, and pay little attention to recombination operators [3]. Currently, a large number of MOEAs directly apply recombination operators which are designed for single-objective optimization problems to produce new solutions [2]. In fact, the single-objective optimization problems usually obtain one or more global optimal offspring, while the multi-objective optimization problems (MOPs) usually have a set of optimal trade-off solutions (called Pareto front, referred to as PF). The corresponding solutions of the PF in the decision space are called the Pareto set (PS). It is worth mentioning that the PS of the continuous multi-objective optimization problem presents a (m-1)-dimensional manifold structure in the variable space [3]. Since there is an essential difference in the topology between single-objective optimization problems and multi-objective optimization problems, the direct application of single-objective optimization recombination operator to MOEAs can not guarantee the best offspring effect [2].
In evolutionary algorithms, according to the way of generating new offsprings, recombination operators can be roughly divided into two categories, namely genetic-based recombination operators and model-based recombination operators [1]. The model-based recombination operator approximates the manifold structure of the Pareto solution set by establishing a probability model, and generates new solutions by sampling from the model [4]. It uses the global statistical information of the population to estimate the distribution of the solution set at a macro level, but it ignores the individual information [5]. The genetic-based differential evolution (DE) operator uses the information of individual differences to guide its further search, in which it lacks the extraction and use of global information [6]. Therefore, it has become a promising research direction to mix the two operators to generate new solutions.
In reference [7], part of the new offspring is sampled from the modified univariate histogram probability model, and the rest is generated by refining the parent individual without any functional evaluation of the cheap local search method. In 2016, Li et al. put forward MOEA/D-CMA [8], the algorithm decomposes a multi-objective optimization problem into multiple single-objective optimization problems. In the evolution process, only one selected sub-problem in each group is optimized by the probability model, while the other sub-problems are optimized by the DE operator. In 2017, Bing et al. [9] improved the sampling process of RM-MEDA, mapping the individuals on the model to the hidden space, and used the crossover mutation operation in DE operator to generate new individuals. In the existing hybrid new offspring generation method, sampling the model is helpful to improve the mining ability of the algorithm [10], [11]. The introduction of DE operator is beneficial to improve the exploration ability of the algorithm in offspring space [12], [18].
Researches have shown that in MOEAs, mating with similar individuals can improve the quality of new solutions and accelerate the convergence of algorithms [39], [40]. Therefore, some mating restriction strategies have been proposed. Zhang et al. [26] used a predefined mating restriction probability δ to select parents from the same class, and selected parents from all solutions with a probability of 1-δ. In reference [34], based on the establishment of a pre-defined mating restriction probability to construct a mating pool, the value of this parameter δ is adaptively adjusted according to the recombination utility of two parental sources. In addition, many mating restriction strategies [35], [14] have been proposed. It is not difficult to see that most of these algorithms currently use probability values to determine new solutions for global exploration or local exploitation. These mating restrictions pay little attention to individual quality information, they also require the setting of multiple control parameters, e.g., sharing radius [37], candidate size [38].
In the evolution process, each individual has different performances in convergence and diversity. Individuals with better quality need more mining; individuals with poor quality need more exploration [36]. Therefore, in order to improve the performance of the DE operator, we can use individual quality information to guide the construction of the mating pool.
Based on the above analysis, and inspired by the existing achievements, this paper proposes a K-means clusteringbased hybrid offspring generation mechanism multi-objective evolutionary algorithm (KMDEA). The innovations of this paper are stated as follows: -Compared with the traditional clustering-based MOEAs, KMDEA constructs a multivariate Gaussian model according to the clustering results of the initial population. The Gaussian model based on clustering algorithm is used to approximate the manifold structure and detect the regularity property.
-Compared with the existing hybrid recombination operators, the model sampling offspring is used as the mating parent to expand the offspring population generated by DE operator.
-In the process of mating pool construction, different from the previous mating restriction strategy in the form of probability, this paper directly uses deterministic information to construct mating pool.
The rest of the paper is organized as follows. Section II presents the proposed algorithm in details. Section III presents the experimental results, and section IV provides sensitivity to control parameters on KMDEA. Section V concludes the paper with some remarks for future work.

II. A K-MEANS CLUSTERING BASED HYBRID OFFSPRING GENERATION MECHANISM MOEA
A. THE FRAMEWORK OF KMDEA Algorithm 1 gives the algorithm framework of KMDEA. In the initialization phase, we first get the initial population P (line 1). In the iterative process, the fast VOLUME 9, 2021 non-dominated sorting scheme [24] is used to divide the population P into L different nondominated fronts, in which B 1 denotes the best front, and B L indicates the worst front (line 3). In line 4, the K-means clustering is performed for the population P to obtain clustering results {C K }. Lines 5-12 are the hybrid new solution generation mechanism, which is briefly described here. The above mechanism first constructs a mixture Gaussian model (line 6) for the individuals in the best dominant front in each class, and samples the model to obtain the model offspring y mod (line 7). Then, the model sampling offspring is added to the mating pool as the mating parent to expand the mating pool (line 9). In this way, the sampled offsprings can participate in the generation of new offsprings as both offspring and mating parents, so as to enhance the local exploitation ability of the next neighbor-based mating. Finally, the mating pool Q is established for the parent individuals in each cluster, and the DE operator is used to generate a new solution y (lines [10][11]. The result of fast non-dominated sorting is directly applied in the construction of mating pool Q. The nondominated solutions in each class are used for local exploitation, while the dominated solutions are used for global exploration.

B. K-MEANS CLUSTERING
K-means clustering [13] is one of the most widely used clustering algorithms. It is a clustering method based on partition. The so-called partition based clustering algorithm is to divide the data object set into independent subsets according to the similarity between data objects. K-means clustering algorithm concretizes the similarity between data objects into the distance between each object and the pre selected clustering center. This clustering algorithm assigns each object to the nearest cluster center to form a cluster, and iteratively optimizes the square error criterion to improve the quality of each cluster.
In the hybrid offspring generation mechanism based on K-means clustering, K-means algorithm is used to explore the distribution structure (neighbor relationship) [14] of offsprings in the population. Based on the population distribution, a multivariate Gaussian model is constructed for the dominant individuals in each class to approximate the population structure, and the model offspring is generated by sampling.

C. GENERATE NEW SOLUTIONS
Gauss model is one of the most widely used probabilistic models [2]. A multivariate Gauss distribution of random variables x = (x 1 , x 1 , . . . x n ) T can be expressed as x ∼ N (µ, ). Among them, µ is the mean vector, is the covariance matrix. The corresponding probability density function of random variable is expressed as: Expanding mating pool with sampling results: Q = Q ∪ y mod ;

10
Set mating pool Q: For a given set of data x 1 , x 2 , . . . ., x K , The mean vector and covariance matrix [15] are estimated as follows: The process of sampling a new point x = (x 1 , x 1 , . . . x n ) T by the multivariate Gaussian probability model is as follows: Algorithm 2 GaussianSmple 1 A lower triangular matrix A is obtained by Cholesky decomposition covariance matrix, which satisfies = AA T ; 2 Generate a single factor Gaussian distribution vector y= (y 1 , y 2 , . . . y n ) T , where y j N (0, 1) , j = 1, 2, . . . , n obeys the standard normal distribution. 3 Command x = µ + Ay.
DE operator [16] is a way to generate new offsprings for KMDEA. Its specific details are shown in Algorithm 3. In the process of generating the new offspring, the DE operator generates the initial offspring, and then the polynomial mutation operator is used to mutate the new offspring. In order to make the new offspring feasible, the new offspring is repaired according to the situation before and after mutation operation. In Algorithm 3, F and CR are the parameters of DE operator, p m is the probability of variation, η m is the distribution index of the mutation operator.

Algorithm 3 y i
= SolGen x i , Q i , @DE Input: Q: Mating pool x: Current individual Output: New individual: y = (y 1 , y 1 , ..y n ) T 1 Two parental individuals x r1 and x r2 were randomly selected from mating pool Q i ; 2 Generate a new initial offspring y = (y 1 , y 2 , ..y n ) T , where (i = 1, . . . n) Mutate the offspring y i , where r = rand() is a random number. 5 Return the new offspring y.

D. ENVIRONMENT SELECTION
The purpose of environment selection operation is to protect the effective offspring. KMDEA adopts an environment selection method based on the super volume index in SMS-EMOA algorithm [17]. Algorithm 4 gives the details of the environment selection method adopted by KMDEA.
In line 1, a new solution and the current population are combined, and the fast non-dominated sorting scheme is used to divide the population P into L different nondominated fronts, in which B 1 denotes the best front, B L indicates the worst front. Then, the individual with the least amount of hypervolume in the worst front B L is found for removal (lines 2-3). Details of the calculation of hypervolume can be found in [17].

E. TIME COMPLEXITY
At each generation, the operations of the developed algorithm include 1) a K-mean clustering procedure; 2) hybrid offspring generation mechanism. It is known that the time complexity of K-means is O(NKnI ), where I is the number of iterations for training. For the second operation, the time complexity is O(Nn). Thus in total, the time complexity is O(NKnT + Nn).

Algorithm 4 P = Select(P, y i )
Input: P: Population; y i : New generation of individuals; Output: the updated population P; 1 Fast non-dominated sorting for P ∪ y i , obtain results B = {B 1 , . . . , B L ; 2 Identifying the worst individuals: x * = arg max x∈P∪y i (x, B L ); 3 Delete the worst individuals P = {P ∪ y i x * }; 4 Return P.

III. EXPERIMENTAL STUDY A. TEST PROBLEMS AND PERFORMANCE METRICS
This paper selects GLT1-GLT6 and LZ1-LZ9, 15 target test functions to test the performance of KMDEA algorithm. Among them, the GLT test set [18] has complex PF frontiers, while the LZ test set [6] has a complex PS structure. More details show in table 1. The performance measures are inverted generation distance (IGD) and hypervolume (HV).

1) INVERTED GENERATION DISTANCE (IGD)
The inverse generation distance evaluation index is a comprehensive performance evaluation index [21]. It mainly reflects the convergence and distribution performance of the algorithm by calculating the average value of the minimum distance from the uniform point on the Pareto surface to the non dominated offspring set, it is defined as follows: where the d(x * , P) refers to the shortest Euclidean distance between the point x * and the individual set P obtained by the algorithm. The |P * | refers to the number of midpoint of the P * . The smaller the IGD value, the better the comprehensive performance (convergence and diversity) of the algorithm.

2) HYPERVOLUME METRIC (HV)
The hypervolume represents the volume of the hypercube surrounded by the obtained Pareto offspring set and the reference point, it is an index to evaluate the comprehensive performance of the algorithm [20]. It is defined as follows: where r = (r 1 , . . . r m ) represents the reference point of the Pareto optimal offspring in the target space. The  points out the specific calculation method of Lebesgue measure. The larger the HV value, the better the performance of the algorithm.

B. COMPARISON OF KMD MECHANISM WITH MULTIPLE ENVIRONMENT SELECTION OPERATORS
In order to verify the validity of the K-means clusteringbased hybrid offspring generation mechanism, we combine the above mechanisms into 3 different environment selection operators, and get KMDSP, KMDNS and KMDEA. The three environment selection operators are respectively from the SPEA2 [23], NSGA-II [24] and SMS-EMOA [17]. This paper compares the above six algorithms. The parameter settings of all algorithms are as follows, where the DE parameters and mutation parameters are set as recommended in [6].
Among them, SPEA2, NSGA-II and SMS-EMOA do not need to set special parameters except public parameters.
From table 2, we can see that for the GLT test set with complex PF characteristics, the combination algorithm of KMD mechanism obtains the best IGD value on GLT1-GLT5 test questions. Although the GLT6 test question did not get the best value, the KMD mechanism combined algorithm also has a good performance. This is because SPEA2, NSGA-II and SMS-EMOA ignore the regularity property, they are not suitable for solving complex PS or PF test suites. In the three KMD mechanism combination algorithm, the KMDEA has the best performance, and obtains the optimal value of the three test questions of GLT1, GLT2 and GLT3. This is because the hypervolume indicator is known to be ''Pareto compliant'' [21]. It has been shown that the hypervolume indicator based environmental selection approach has exhibited superior performance against those based on dominance and decomposition [27]. This shows that the K-means clustering-based hybrid offspring generation mechanism is effective in enhancing the performance of MOEAs.

C. COMPARISON BETWEEN KMDEA AND VARIOUS ALGORITHMS
In order to verify the performance of KMDEA, this paper chooses IMMOEA [25], MOEA/D-CMA [8], RM-MEDA [3] and SMEA [26] as the comparison algorithm. All comparison algorithms use the best parameters in the original literature. All algorithms are programmed in MATLAB. The specific parameters of each algorithm are set as follows: Public    For the results of Wilcoxon rank sum test, the + labeled in back of a result denotes that the compared algorithm is better than KMDEA; in contrast, the -means that KMDEA is outperformed by the compared algorithm; while the = means that there is no statistically significant difference between the results obtained by KMDEA and the compared algorithm. The best statistical results of the above test problems are highlighted. Table 3 shows the analysis results of IGD, among which IMMOEA, MOEA/D-CMA, RMMEDA, SMEA and KMDEA algorithm get 1, 2, 1, 1 and 10 best indicators respectively. Table 4 shows the analysis results of the HV index. Each algorithm obtains 1, 4, 1, 1 and 8 best values respectively. In the 30 comparison results of all other comparison algorithms, for the performance of Wilcoxon rank sum  test at 5% significant level, KMDEA algorithm obtained 28, 21, 26 and 26 significant advantages IGD and HV indicators.

1) STATISTICAL ANALYSIS
In general, compared with IMMOEA, MOEA/D-CMA, RMMEDA and SMEA, KMDEA algorithm has certain advantages in GLT test suite with complex PF and LZ test suite with complex PS. Specifically, for IM-MOEA, RM-MOEA and SMEA, they use the single offspring generation mechanism. These algorithms only obtained optimal values on a few test problems, while KMDEA showed advantages on almost the entire test suite. For MOEA/D-CMA, although the algorithm also uses DE operator and covariance matrix adaptive strategies to generate new solutions, the performance of MOEA/D-CMA is much lower than that of KMDEA. The main reason for this result is that the MOEA/D-CMA fails to take full account of the connectivity of PS and the regular property of MOPs.

2) SEARCH EFFICIENCY
In order to analyze the search efficiency of KMDEA algorithm, figure 1 shows the evolution curve of the optimal IGD index after all 30 independent operations of GLT test.
It can be seen from figure 1 that KMDEA has a very clear advantage for GLT3 and GLT4. For the remaining GLT test problems, although there are comparison algorithms with similar performance to KMDEA, it is undeniable that KMDEA can still obtain the smallest IGD index value at the fastest speed. The comparison results show that for the GLT test suite, the KFGEA has a good convergence speed and the highest search efficiency.

3) VISUAL COMPARISON
In order to further verify the performance of KMDEA, figure 2 draws the representative PFs of the average IGD value obtained when MOEA/D-CMA, RM-MEDA, SMEA and KMDEA solve GLT test suite. Figure 2 shows that for the GLT1-GLT4, compared with other algorithms, KMDEA shows excellent convergence and diversity. For GLT5 and GLT6, MOEA/D-CMA converges to the real PF, but its diversity is poor. SMEA, RM-MEDA and KMDEA can cover the real PFs completely, but it is obvious that the distribution of KMDEA is more uniform. Visual comparison shows that KMDEA performs better on the overall offspring of the GLT test suite.
To sum up, through the statistical comparison, convergence speed and visual comparison of performance index values, it can be concluded that KMDEA has the best offspring performance for MOPs with complex shape of PS or PF compared with IMMOEA, MOEA / D-CMA, RM-MEDA and SMEA.

IV. SENSITIVITY TO CONTROL PARAMETERS
The clustering number in the K-means clustering algorithm has a direct impact on the clustering results. In order to analyze the influence of the number of clusters on KMDEA, adjust the clustering number of KMDEA algorithm K (K = 4, 5, 7, 10, 20) under other parameters unchanged, and perform 20 operations on the GLT test set. The mean and standard deviation of the IGD values obtained by the KMDEA with different K values are shown in figure 3.
As shown in figure 3, when solving GLT1, GLT2, GLT3 and GLT5, different K values have no significant effect on the mean value and standard deviation of IGD values obtained by KMDEA. When solving GLT4 and GLT6, the K value has a smaller effect on the IGD value. Therefore, the number of clusters has little effect on the overall performance of the algorithm. KMDEA is not particularly sensitive to the control parameter value, and the algorithm has good robustness.

V. CONCLUSION
For the discovery of the regularity property of MOPs, this paper performs K-means clustering on the initial population, and builds a Gaussian mixture model based on the clustering results. Then, the sampling results of the model are used to expand the mating pool to make up for the shortcomings of the single offspring generation mechanism. Furthermore, in order to balance the local exploitation and global exploration, this paper uses individual quality information to guide the construction of the mating pool. Finally, the K-means clustering-based hybrid offspring generation mechanism multi-objective evolutionary algorithm (KMDEA) is proposed.
In this paper, the K-means clustering-based hybrid offspring generation mechanism is integrated into the environment selection operators of SMS-EMOA, SPEA2 and NSGA-II to verify the effectiveness of the mechanism. After that, comparing KMDEA with IMMOEA, MOEA/D-CMA, RMMEDA, SMEA and other mainstream multi-objective evolutionary algorithms, the results show that KMDEA has obvious advantages in solving MOPs with complex characteristics. Finally, parametric sensitivity analysis is carried out, and the results show that KMDEA has good robustness.
The follow-up work will focus on the following points: (a) improvement of clustering algorithm; (b) combining other multi-objective optimization algorithms in the KMDEA framework, such as particle swarm optimization, ant colony optimization, etc.