Characterization of Constrained Continuous Multiobjective Optimization Problems: A Performance Space Perspective

Constrained multiobjective optimization has gained much interest in the past few years. However, constrained multiobjective optimization problems (CMOPs) are still unsatisfactorily understood. Consequently, the choice of adequate CMOPs for benchmarking is difficult and lacks a formal background. This paper addresses this issue by exploring CMOPs from a performance space perspective. First, it presents a novel performance assessment approach designed explicitly for constrained multiobjective optimization. This methodology offers a first attempt to simultaneously measure the performance in approximating the Pareto front and constraint satisfaction. Secondly, it proposes an approach to measure the capability of the given optimization problem to differentiate among algorithm performances. Finally, this approach is used to contrast eight frequently used artificial test suites of CMOPs. The experimental results reveal which suites are more efficient in discerning between three well-known multiobjective optimization algorithms. Benchmark designers can use these results to select the most appropriate CMOPs for their needs.


I. INTRODUCTION
M ANY real-world continuous optimization problems in- volve the optimization of multiple, often conflicting objectives, and constraints that need to be respected [1].Such problems are known as constrained multiobjective optimization problems (CMOPs) and have recently gained much interest in the evolutionary computation community.Indeed, several novel techniques for constraint handling and new test suites of CMOPs have been proposed recently (e.g., [2]- [5]).
Despite the large amount of recently published articles in the field of constrained multiobjective optimization, the CMOPs for benchmarking multiobjective evolutionary algorithms (MOEAs) and corresponding constraint handling techniques (CHTs) are still unsatisfactorily understood and characterized [6], [7].Consequently, the selection of appropriate CMOPs for benchmarking is difficult and lacks a formal background.In the circumstances, preparing a sound and welldesigned experimental setup for constrained multiobjective op-timization is a challenging task.A poorly designed benchmark might lead to inadequate conclusions about CMOP landscapes and MOEA performance [7].
According to [8], there are two main options for characterizing and evaluating the quality of optimization problems, namely through the feature space and performance space.The feature space can be seen as a space of problem characteristics, including basic characteristics such as problem dimensionality and type of objective and constraint functions, as well as more advanced problem characteristics derived using methods developed in the field of exploratory landscape analysis (ELA) [9].On the other hand, the performance space represents the problems based on the obtained algorithm performance (behavior) while solving these problems.Similar to the feature space, basic statistics can be used, such as mean or median algorithm performance, as well as more advanced methods, e.g., data profiles [10] or empirical cumulative distribution functions (ECDFs) [11], [12].In contrast to the aggregated values (means, medians, etc.), the latter two methods consider the progress of the whole algorithm run and, this way, provide more comprehensive information about the algorithm behavior.In our previous work [7], we provided an extensive study of characterizing CMOPs through the feature space, while, to the best of our knowledge, the performance space has not been addressed in sufficient depth yet.
In the literature, the performance indicators used in constrained multiobjective optimization are the same as those used in unconstrained multiobjective optimization, they are simply applied only to feasible solutions.The most frequently employed indicators are the hypervolume indicator [13] and inverted generational distance [14] since they can provide information about the convergence and the diversity of the obtained Pareto front approximation.For monitoring the performance during the run, one can use convergence graphs, data profiles or ECDFs.However, none of these techniques provides relevant information until feasible solutions are discovered.As a result, essential insights about the algorithm behavior and CMOP characteristics are missed.To overcome this situation, some papers report also the constraint satisfaction progress [15].Nevertheless, to the best of our knowledge, no method from the literature simultaneously measures both the convergence towards the Pareto front and constraint satisfaction, making the experimental analysis incomplete.
Moreover, we are aware of only a single work analyzing the CMOPs from the performance space perspective which was conducted in 2017 [16].The authors used five CHTs to characterize five artificial and seven real-world test problems.The results revealed that only a single artificial test problem was suitable for benchmarking algorithms since the other four problems could be solved even without employing a CHT.Additionally, the studied real-world problems were inadequate since they could not differentiate among MOEAs-a desired property of a test problem as it provides relevant information for algorithm designers [8].Since 2017, several novel test suites of CMOPs have been proposed, and their ability to differentiate among MOEAs has not been investigated yet [17].
In this paper, we present a novel anytime performance assessment approach specifically designed for constrained multiobjective optimization.It simultaneously monitors both the Pareto front approximation and constraint satisfaction.The approach is inspired by the anytime performance assessment of algorithms on unconstrained bi-objective optimization problems used in COCO (COmparing Continuous Optimizers) [11].In addition, we propose an approach to measure the capability of a given problem to differentiate among MOEAs.The resulting measure is then used to evaluate the most frequently used artificial test suites of CMOPs.Because of space limitations, we present only selected results in this paper.The interested reader can find the complete results online 1 .
The rest of this paper is organized as follows.In Section II, we provide the theoretical background for constrained multiobjective optimization and introduce the COCO platform Then, in Section III, we extend the performance assessment from the COCO platform to CMOPs and propose an approach to characterize CMOPs based on the algorithm performance.Section IV provides details on the experimental setup, while Section V presents the results, evaluates the existing test suites of CMOPs, and discusses some limitations of the proposed methodology.Finally, a summary of findings and ideas for future work are discussed in Section VI.

II. BACKGROUND
In this section, we provide the theoretical background for this work.After the definitions for CMOPs and constraint violation, we describe the performance assessment approach for multiobjective optimization used in the COCO platform.

A. Constrained Multiobjective Optimization Problems
A CMOP is, without loss of generality, formulated as: where x = (x 1 , . . ., x D ) is a search vector, f m : S → R are objective functions, g i : S → R constraint functions, S ⊆ R D is a search space of dimension D, and M and I are the numbers of objectives and constraints, respectively.In particular, when M = 1 the corresponding problem is a constrained single-objective optimization problem (CSOP).To differentiate between CMOPs and CSOPs, in the later case, we will omit the index m (f = f 1 ).Additionally, if the problem has no constraints, it is called a single-objective optimization problem (SOP) when M = 1, and a multiobjective optimization problem (MOP) when M > 1.
One of the most important concepts in constrained optimization is the notion of the constraint violation.For a single constraint g i it is defined as v i (x) = max(0, g i (x)) and combined for all constraints together as into the overall constraint violation.A solution x is feasible iff its overall constraint violation equals zero (v(x) = 0).Note that other definitions for overall constraint violation exist, and their use would impact the analysis performed in this study.However, the proposed definition for the overall constraint violation is by far the most commonly adopted in constrained optimization [17], and as such, it represents the most appropriate choice.
A feasible solution x ∈ S dominates another solution y ∈ S iff f m (x) ≤ f m (y) for all 1 ≤ m ≤ M and f m (x) < f m (y) for at least one 1 ≤ m ≤ M .Additionally, a solution x * ∈ S is Pareto optimal if there is no solution x ∈ S such that dominates x * .We can generalize the Pareto dominance to sets.A set X dominates set Y iff for each y ∈ Y there exists at least one solution x ∈ X that dominates y.
The set of all feasible solutions is called the feasible region and is denoted by F = {x ∈ S | v(x) = 0}.All nondominated feasible solutions represent a Pareto-optimal set, S o .The image of the Pareto-optimal set in the objective space is the Pareto front and is denoted here by The ideal objective vector, z ide , is defined as the vector in the objective space that contains the optimal objective value for each objective separately and it is expressed as Additionally, the nadir objective vector, z nad , consists in each objective of the worst value obtained by any Pareto-optimal solution.It can be expressed as An additional important concept is the region of interest in the objective space, Z, which represents the set of objective vectors bounded by the ideal and nadir objective vectors.If good approximations for ideal and nadir objective vectors are known, the objective functions can be normalized to ( This way, the objective values are of approximately the same magnitude and the range of the objective values for Paretooptimal solutions is [0, 1].Note that after normalization the z ide consists of m zeros (z ide = (0, . . ., 0)), z nad of m ones (z nad = (1, . . ., 1)), and the region of interest Z equals [0, 1] M .In particular, for SOPs and CSOPs the normalization results in f (x * ) = 0.In the rest of this paper, we assume that all the objective functions are normalized.

B. Empirical Runtime Distributions (ERDs)
The performance measurement approach used in the COCO framework [11], [12] relies on the number of function evaluations 2 -called runtime-needed for an algorithm, a, to reach predefined quality indicator targets.More precisely, we can present an algorithm run after preforming T function evaluations as a sequence of candidate solutions, A T (a) = {x 1 (a), . . ., x T (a)}.Within this framework, a quality indicator, I, is defined as a function mapping A T (a) to a real value.Here, we assume that low quality indicator values indicate better sequences of candidate solutions, and vice versa.Additionally, the runtime for a given quality indicator target equals to the lowest T for which I(A T (a)) reaches the given target precision value, τ .Note that in the following, if there is no ambiguity, we remove the algorithm notation a from A T (a).
In practice, we define several target precision values to understand the algorithm behavior through the entire run.Runtimes can be formally defined as random variables and displayed as an empirical cumulative distribution functioncalled empirical runtime distribution (ERD) in the COCO framework.ERDs are used to display the proportion of target values reached within a specified budget and can be easily aggregated over multiple restarts, runs or even multiple problems.For more details on ERDs, see [11], [12].The runtime data set for an algorithm a and all targets τ -s is denoted as {T a (τ )} τ .Finally, the runtimes in COCO are usually studied in a logarithmic scale and this perspective is used throughout this paper as well.

C. Quality Indicators
Based on the nature of the optimization problem there are various quality indicators used by the COCO framework.Those relevant for this work are as follows.
1) Single-objective optimization: In this case, the quality indicator is the best so far observed objective function value: 2) Constrained single-objective optimization: The quality indicator for unconstrained problems ( 6) is extended by the addition of the overall constraint violation as follows: 3) Multiobjective optimization: The quality indicator for MOPs consists of two parts.When no solution from the sequence A T dominates the nadir point (reference point), the distance to the region of interest Z is used to measure the quality of the solutions (see Fig. 1b).In contrast, when at least one of the solutions dominates the nadir point, the hypervolume indicator is used instead (see Fig. 1c).This quality indicator can be mathematically expressed as: 2 When we refer to a function evaluation, we actually mean the evaluation of all the objective and constraint functions.For example, for a bi-objective problem with three constraints we need to perform five evaluations, however, we count this as a single function evaluation. Here, represents the hypervolume of the archive A T , N (A T ) is the set of all the points from A T dominating the reference point which equals (1, . . ., 1), and is the smallest Euclidean distance between the archive and the region of interest Z.Additional information on this quality indicator can be found in [12].

D. Target Precision Values for MOPs
For each problem a set of quality indicator target values is chosen, which is used to measure algorithm runtimes and in turn to calculate ERDs.The target values are computed in the form of τ (ε) = τ ref +ε, where τ ref is a reference I MOP value.It is either based on the hypervolume of the true Pareto front or an estimation for it.In COCO, 51 positive target precision values ε ∈ {10 −5 , 10 −4.9 , . . ., 10 −0.1 , 10 0 } are chosen 3 .Note that it is not uncommon that the quality indicator value of the algorithm never reaches some of these target values, which leads to missing runtime measurements.

III. METHODOLOGY
This section provides an extension of ERDs to constrained multiobjective optimization.It also discusses an approach to measure the problem's effectiveness to distinguish algorithms based on the distance between ERDs.

A. Quality Indicator for CMOPs
There are two main paradigms to approach constrained optimization problems.The first one is applicable when the constraints must be satisfied at any cost, while the second one allows for partial violation of constraints if the objective values of a solution are of good quality.Although both paradigms have their pros and cons, we study the former one as this is the prevalent approach in the literature 4 .
Furthermore, in many real-world scenarios the objective values cannot be calculated if the solution is infeasible [18].This often happens in simulation-based optimization where the simulator cannot return meaningful results if some of the constraints are not satisfied.Consequently, our main assumption in constructing a quality indicator for constrained multiobjective optimization is that an infeasible solution is strictly worse than any feasible solutions regardless of the objective value quality-this is exactly the pillar of the first paradigm.For example, in Fig. 1c, solution z 4 has better objective values than z 5 .Actually, if no constraints were considered, z 4 would dominate z 5 .Nevertheless, z 5 is considered to be strictly better than z 4 .This desired property of a quality indicator can be mathematically expressed as follows: The quality indicator for CSOPs (7) does not satisfy this property.The biggest disadvantage of an indicator not satisfying (11) is that no matter how small the quality indicator value is, we cannot know whether there exists a feasible solution in A T .For example, for a certain CSOP there might exist a solution in A T with f (x) = 0 and an arbitrarily small overall constraint violation value.In other words, unless I CSOP equals zero (x * ∈ A T ), we cannot know for certain whether we found a feasible solution relying solely on the quality indicator values.From a practical point of view, we wish for a quality indicator to involve a threshold that unequivocally indicates when the algorithm reached the feasible space.
Considering this, we propose an extension of the quality indicator for MOPs (8) as follows: where τ * is a threshold to indicate that the feasible space was reached.For example, in the COCO framework, it can be set to the largest considered target for MOPs, which equals 1.It is also obvious that I CMOP satisfies the property (11).The behavior of the proposed quality indicator is illustrated in Fig. 1.
Additionally, note that in (12) only feasible solutions are considered in the calculation of I MOP .This can be seen in Figs.1b and 1c where the infeasible solutions are not considered for the calculation of the distance and hypervolume, respectively, once feasible solutions have been found.

B. Performance Space Comparison
According to [8], a good test suite should include problems that "enable the user to tell the algorithms apart in the performance space".To measure the ability of a given problem to differentiate among MOEAs, we rely on the area between the corresponding ERDs (see Fig. 2, area in gray).The intuition is that a large area between ERDs indicates large differences between the runtimes and in turn between the algorithms.
Based on the area between the two ERDs we want to propose a metric, ∆, that with a single number provides information about the similarity between algorithms and their performance.Let us assume we are dealing with two algorithms a and b, and we have their corresponding runtime data sets for a certain optimization problem {T a (τ )} τ and {T b (τ )} τ .Then, the area of a single segment between the runtimes (in the logarithmic scale) for the same target can be obviously calculated as follows (see Fig. 2, the bold line between runtimes): where N τ is the number of targets.In particular, when a certain runtime is missing, we set its value to the maximal budget (number of function evaluations).This is done for calculation purpose and has no particular meaning.Using the formula (13), the area bounded by the two ERDs and thus ∆ can be expressed as the sum of these segment areas over all the targets: where N f is the number of function evaluations.The formula is additionally divided by log N f for normalization purposes.It can be easily seen that ∆(a, b) ∈ [0, 1] for all algorithms and problems.In particular, small values indicate similar behavior of the chosen algorithms, and vice versa.For example, in the extreme case algorithm a might solve all the targets within a single evaluation, while the algorithm b does not reach any target.In this case ∆(a, b) = 1.In the second extreme case all the runtimes coincide and ∆(a, b) = 0.
The ∆(a, b) metric can be additionally expressed as: where ∆ − and ∆ + represent the sum of segments ( 13) over targets measuring constraint satisfaction, τ − , and targets expressing the algorithm effectiveness in approximating the Pareto front (called front approximation for short), τ + , respectively.Additionally, N τ + is the number of τ + targets, and N τ − the number of τ − targets.In particular, ∆ − can be seen as a measure of algorithm differences in constraint satisfaction, while ∆ + is a metric measuring differences in front approximation.

IV. EXPERIMENTAL ANALYSIS
This section introduces the test suites of CMOPs used for the experiments, discusses the chosen MOEAs and their CHTs, and provides the parameter and implementation details.

B. Multiobjective Evolutionary Optimization Algorithms
Three well-known CMOEAs were used to investigate the proposed assessment methodology and to compare the test suites: NSGA-III [21], [24], C-TAEA [23] and MOEA/D-IEpsilon [15] all equipped with their default CHTs.
NSGA-III is a well-known algorithm that uses the constrained domination principle (CDP) [25] as a CHT.This principle is an extension of the dominance relation and is the most widely-used technique to solve CMOPs.It strictly favors feasible solutions over infeasible ones.While feasible solutions are compared based on Pareto dominance ( ), infeasible solutions are compared according to the overall constraint violation.The formal definition of CDP, as presented in [3], is the following: Next, the CHT used in MOEA/D-IEpsilon is based on the ε-constraint relation: where is the Tchebycheff aggregation function.
The comparison level ε t is updated in each generation following the expression: where t is the generational counter, τ , α, T c are user-defined parameters, v(x θ ) is the overall constraint violation of the top θ-th individual (according to overall constraint violation value) in the initial population, and ρ F (P t ) is the proportion of feasible solutions in the current population P t .Additional details on MOEA/D-IEpsilon can be found in [15].Finally, the main idea behind C-TAEA is the maintenance of two separate archives.One archive is used to promote convergence, while the other one to maintain diversity.Besides, a special restricted mating approach is employed to balance between the two archives.The CHT used by C-TAEA is incorporated within the update of the convergence archive.Similarly to CDP, this CHT strictly favors feasible solutions which are compared based on Pareto dominance.On the other hand, the infeasible solutions are ranked using nondominated sorting for a custom bi-objective problem expressed as minimize (v(x), g tc (x | ν)). ( The convergence archive is updated with all feasible solutions and the best infeasible solutions according to the Pareto ranking applied in (20).In addition, the diversity archive does not consider feasibility at all allowing infeasible solutions to persists in the population.More information on this method are available in [23].
We chose three MOEAs with complementary CHTs: (i) the CDP employed in NSGA-III strictly favors feasible solutions, (ii) the diversity archive in C-TAEA allows infeasible solutions to remain in the population, and (iii) MOEA/D-IEpsilon adaptively updates the comparison level each generation following (19); when the feasibility ratio of the current population becomes large, ε t increases and progressively more solutions (infeasible ones included) are compared solely according to the objective values.

C. Parameter Settings
The proposed performance assessment is demonstrated on on the test suites listed in Table I.In particular, three-objective C-DTLZ and DC-DTLZ problems were considered with the default number of constraints.Additionally, a difficulty triplet of (0.5, 0.5, 0.5) was used for the DAS-CMOP suite as this is by far the most frequently used difficulty triplet in the literature.
Three dimensions of the search space D ∈ {5, 10, 30} were used to evaluate the proposed performance assessment methodology and compare the test suites.All the Pareto fronts can be analytically expressed and the corresponding hypervolume values exactly calculated.
All MOEAs were run with an equal population size, N p , and the same number of generations, N g .In particular, the population size was set to N p = 100M .The number of generations was set to N g = 120D/M and was selected as approximately the minimal value to obtain convergence for all the MOEAs (in total, 12000D function evaluations).Note, the division by M in the expression for N g is necessary to enable aggregation over CMOPs with different number of objectives.The resulting values for N p and N g are shown in Table II.
Other parameters of the algorithms and their operators were set to their default values [15], [23], [24]: The polynomial mutation was used in all the MOEAs.The mutation probability was set to 1/D and the distribution index to 20.Specifically, the simulated binary crossover was used in NSGA-III and C-TAEA with the crossover probability of 1 and the distribution index of 30.In contrast, a differential-evolution-based crossover was used in MOEA/D with a crossover probability of 1 and scaling factor of 0.5.Additionally, in MOEA/D, the neighborhood size was set to 30, probability of neighborhood mating to 0.9, the maximal number of solutions replaced by a child to 2, τ to 0.1, α to 0.95, T c to 0.8, and θ to 0.05N p .
Finally, ERDs were computed without employing restarts or bootstrapping.

D. Target Precision Values for CMOPs
The values of the distance metric d defined in (10), as well as those of the overall constraint violation v can result in different magnitudes for different CMOPs.Consequently, it is impossible to define a set of target precision values that would provide meaningful results for all the studied CMOPs.As we wanted to compare different CMOPs and suites, we first sampled 100 solutions {x i } i for each CMOP and normalized d and v as follows: and v = d/10 log(med({v i }i)) (22) where med({d i } i ) and med({v i } i ) are median values of the sets {d i } i = {d(x i , Z)} i and {v i } i = {v(x i )} i , respectively.After applying this procedure and preforming several experiments, we set τ * to 1. Additionally, a good set of target precision values for The first half of target precision values τ + applies to feasible solutions and represents how well the algorithm approximates the Pareto front, while the second half of the targets τ − is used to understand the algorithm performance in satisfying the constraints.

E. Implementation Details
All the CMOPs, MOEAs and the performance measurement procedure were implemented in the Python programming language [26].We used the pymoo [27] implementation for CTP, DAS-CMOP, MW, NSGA-III, C-TAEA and hypervolume calculation.The rest of the suites, MOEA/D-IEpsilon and other functionalities were implemented from scratch.

V. RESULTS AND DISCUSSION
In this section, we first present the experimental results.Next, we discuss the existing test suites of CMOPs and their potential in differentiating algorithms.Finally, we present some limitations of the proposed methodology.

A. Results
Let us first look at the results for a single problem.As an example we select the MW13 problem.Fig. 3 shows the ERDs for this problem aggregated over 30 runs for each of the three algorithms.To aid visualization and comparison, the function evaluations (x-axis) are divided by problem dimension and shown in logarithmic scale.
The horizontal dashed line divides the targets into τ − and τ + .Note that this intuition is true only for a single run without aggregation.Nevertheless, if an ERD (single or aggregated) starts above this line, then all the algorithm runs started in the feasible region-the first initialized solution is feasible.On the other hand, if an ERD starts below the dashed line and never crosses it, then this indicates that the corresponding algorithm runs never reached the feasible region.Moreover, the line is thicker when some, but not all runs have found feasible solutions.
As we can see, all the MOEAs reach all the targets for the two smaller dimensions, while NSGA-III fails to reach some targets on 30-D problems.All the algorithms are able to find feasible solutions in all of the runs with NSGA-III being generally quickest in this regard.The aggregated results over all problems of a suite are shown in Figs. 4 and 5 on the left hand side of each subfigure.For example, Fig. 4a shows the ERDs for the selected MOEAs aggregated over all problems from the CTP suite in 5-D.On the right hand side of each subfigure we see violin plots of distributions for ∆ + (top left), ∆ − (bottom left), and ∆ (right) values.Each of these values was computed for 30 runs of each pair of algorithms on each problem and is represented in the plot as a black dot.The first column shows the distributions for all the considered CMOPs and the rest of the columns correspond to each suite separately.The y-axis depicts the ∆ + , ∆ − , or ∆ value, while the x-axis has no specific meaning and is used solely for better visualization.The violin plot (colored area) approximates the probability density function for ∆ + , ∆ − , or ∆ values.For example, Fig. 5j shows there are more problem instances in the MW suite with ∆ ≈ 0.05 than those with ∆ ≈ 0.10.In addition, there are no problem instances with ∆ > 0.25.

B. Test Suites Evaluation
As already discussed in Section III-B, a well-designed test suite should include a wide variety of problems that can differentiate among MOEAs (∆).Since we are dealing with constrained problems, we are particularly interested in evaluating the ability of the problems to differentiate among the algorithms with respect to constraint handling (∆ − ).
• CTP: As we can see, for all problems the algorithms start in the feasible space (Figs.4a, 4b, and 4c).The main difficulty they face is front approximation.This is additionally confirmed by the violin plots showing that ∆ − equals 0 for all dimensions.• CF: Unlike for CTPs, the MOEAs find no feasible solutions at the very beginning of the evolution process (Figs.4d, 4e, and 4f).Interestingly, the difference in algorithm performance originates mainly in constraint satisfaction.• C-DTLZ: From the performance space perspective, this suite is well-designed.The algorithms struggle to find feasible solutions in the initial phase of the evolution process (Figs.4g, 4h, and 4i).In addition, the suite can differentiate between algorithms in both constraint satisfaction and front approximation.
• NCTP: Although the three MOEAs need a large number of function evaluations to reach a feasible region, their main challenge is front approximation (Figs.4j, 4k, and  4l).The vast majority of the difference in algorithm performance also comes from front approximation.• DC-DTLZ: Figs.5a, 5b, and 5c show that all the three MOEAs struggle in obtaining feasible solutions, which are discovered only late in the evolution process.Like CFs, the DC-DTLZ suite is especially good at differentiating the constraint satisfaction part of algorithm performance.
• DAS-CMOP: As we can see, for all the DAS-CMOPs the NSGA-III algorithm always starts with a feasible solution, while this is not true for other two MOEAs (Figs. 5d, 5e, and 5f).Nevertheless, feasible solutions are easily discovered by all the algorithms.Moreover, algorithm performance differences are almost exclusively obtained in front approximation, since ∆ − ≈ 0 for all problems and MOEAs.
• LIR-CMOP: The performance space characteristics of this suite are very similar to those of NCTPs.Although the studied algorithms need some time to find feasible solutions, the main difference in the algorithm performance is contained in the front approximation phase (Figs.5g, 5h, and 5i).• MW: From the performance space perspective, MW is one of the most versatile and well-designed artificial test suites found in the literature.It is the best among the studied suites in differentiating the three MOEAs (Figs. 5j, 5k, and 5l).Moreover, as shown by the violin plots, the algorithm performance is diverse in both constraint satisfaction and front approximation.

C. Limitations
We see two potential limitations of the proposed methodology to evaluate the performance space.Firstly, the results can be severely effected by the selection of the algorithms and their budgets, and secondly, the choice of target precision values also has a great impact on the outcome.
To alleviate the first issue, we selected three different MOEAs equipped with distinct CHTs.Additionally, the number of function evaluations was set large enough to assure convergence thus revealing the deviations between the algorithms.Finally, the logarithmic scale was applied to the budget to not bias the results towards the tail of the convergence graphs where the algorithms have already converged.
On the other hand, we were not able to satisfactorily address the issue of some targets having greater impact than others.For example, there is no assurance that progressing from target 1 + 10 −4.2 to target 1 + 10 −4.3 is equally important or difficult as progressing from target 1 + 10 −4.3 to target 1 + 10 −4.4 for all the problems and algorithms at hand.Nevertheless, using the target approach and logarithmic scale to define these targets is argued to be much more efficient to compare algorithm performance than just relying on regular convergence graphs [12].

VI. CONCLUSIONS
This paper presents a holistic investigation of the existing artificial test CMOPs from a performance space perspective.Firstly, we have proposed a performance assessment methodology capable of simultaneously monitoring both the front approximation and constraint satisfaction.This methodology is an extension of the approach used by the COCO platform for unconstrained bi-objective optimization problems.Next, the resulting performance methodology has been used to analyze and contrast eight artificial test suites.In particular, the test suites have been assessed with respect to the effectiveness of differentiating between three well-known MOEAs.Finally, the paper discusses the advantages and drawbacks of the existing artificial test suites and discloses some limitations of the proposed methodology.
The experimental results show that the CF, DC-DTLZ, and especially MW suites have the greatest potential in differentiating the three MOEAs.They all include multiple CMOPs that can separate the MOEAs in both front approximation and constraint satisfaction.Additionally, our findings indicate that half of the artificial test suites fail to satisfactorily differentiate among the three MOEAs.This suggests that CMOPs from those suites provide limited information for the algorithm designer and are thus of little value for the benchmarking purpose.Finally, we saw that the predominant source of complexity in artificial test CMOPs is the front approximation.
As for the future work, we suggest to extend the proposed methodology to CMOPs with more than three objectives.In particular, since the hypervolume calculation is expensive in high-dimensional objective spaces, one could investigate the effect of using different performance indicators, e.g., inverted generational distance, epsilon indicator, etc.Additionally, the potential of the proposed methodology in studying algorithm behavior while solving real-world problems should also be addressed.Measuring algorithm performance in this case is especially challenging as in a real-world scenario the Pareto front is usually unknown.Finally, the proposed methodology should be tested with additional MOEAs to further support our findings.

Fig. 1 .
Fig. 1.The quality indicator I CMOP at three stages of the algorithm search: (a) All the solutions belong to the infeasible space (areas in gray) and the quality indicator relies on the overall constraint violation.(b) There exists at least one feasible solution but no solutions dominate the reference point z nad .The quality indicator relies on the distance to the region of interest Z (area bounded by the doted lines and the coordinate axes).(c) There exists at least one feasible solution dominating the reference point.The quality indicator is based on the hypervolume (area depicted with a mesh).

Fig. 2 .
Fig. 2. ERDs corresponding to algorithms a (solid line) and b (dashed line).The area between the lines (in gray) represents the difference in algorithm performance, ∆(a, b).

Fig. 4 .
Fig. 4. Results of the three MOEAs on CMOPs from CTP, CF, C-DTLZ, and NCTP suites.The left plot of each subfigure shows empirical runtime distribution aggregated over all CMOPs in the suite and all targets in dimension 5 (left), 10 (center) and 30 (right).On the right of each subfigure, violin plots depict distributions of ∆ + (top left), ∆ − (bottom left), and ∆ (right) values.The larger the diversity, the better.

Fig. 5 .
Fig. 5. Results of the three MOEAs on CMOPs from DC-DTLZ, DAS-CMOP, LIR-CMOP, and MW suites.The left plot of each subfigure shows empirical runtime distribution aggregated over all CMOPs in the suite and all targets in dimension 5 (left), 10 (center) and 30 (right).On the right of each subfigure, violin plots depict distributions of ∆ + (top left), ∆ − (bottom left), and ∆ (right) values.The larger the diversity, the better.

TABLE I CHARACTERISTICS
OF TEST SUITES: NUMBER OF PROBLEMS, DIMENSION OF THE SEARCH SPACE D, NUMBER OF OBJECTIVES M , AND NUMBER OF CONSTRAINTS I .

TABLE II THE
POPULATION SIZE Np AND NUMBER OF GENERATIONS Ng USED IN THE EXPERIMENTAL ANALYSIS, BASED ON THE NUMBER OF OBJECTIVES M AND THE DIMENSION OF THE SEARCH SPACE D.