Fractal Decomposition Approach for Continuous Multi-Objective Optimization Problems

Multi-objective optimization problems (MOPs) have been widely studied during the last decades. In this article, we present a new intrinsically parallel approach based on Fractal decomposition (FDA) to solve MOPs. The key contribution of the proposed approach is to divide recursively the decision space using hyperspheres. Two different methods were investigated: the first one is based on scalarization that has been distributed on a parallel multi-node architecture virtual environments and taking profit from the FDA’s properties, while the second method is based on Pareto dominance sorting. A comparison with state of the art algorithms on different well known benchmarks shows the efficiency and the robustness of the proposed decomposition approaches.


I. INTRODUCTION
Many problems in science and industry are concerned with multi-objective optimization problems (MOPs). Multiobjective optimization seeks to optimize several components of an objective function vector. Contrary to single-objective optimization, the solution of a MOP is not a single solution, however, a set of solutions known as Pareto optimal set, which is called Pareto front when it is plotted in the objective space. Any solution of this set is optimal in the sense that no improvement can be made on a component of the objective vector without worsening at least another of its components. The main goal in solving a difficult MOP is to approximate the set of solutions within the Pareto optimal set and, consequently, the Pareto front.
Multi-objective metaheuristics can be classified in three main categories: The associate editor coordinating the review of this manuscript and approving it for publication was Halil Yetgin .
• Scalarization-based approaches: this class of multiobjective metaheuristics contains the approaches which transform a MOP problem into a single-objective one or a set of such problems. Among these methods one can find the aggregation methods, weighted metrics, Tchebychev method, goal programming methods, achievement functions, goal attainment methods and the -constraint methods [14], [19], [20], [22].
• Dominance-based approaches: the dominance-based approaches 1 use the concept of dominance and Pareto optimality to guide the search process. Since the beginning of the nineties, interest concerning MOPs area with Pareto approaches always grows. Most of Pareto approaches use EMO (Evolutionary Multicriterion Optimization) algorithms. Population-based metaheuristics seem particularly suitable to solve MOPs, because they deal simultaneously with a set of solutions which allows to find several members of the Pareto optimal set in a single run of the algorithm. Moreover, they are less sensitive to the shape of the Pareto front (continuity, convexity). The well known NSGA-II algorithm [6] belongs to this category. It uses a crowded comparison method to select suitable individuals. An improved version of the crowding distance has been proposed in [10] and shows to improve NSGA-II performances. The main differences between the various proposed approaches in this category arise in the followings search components: fitness assignment, diversity management, and elitism [18], [32], [36].
• Decomposition-based approaches: most of decomposition based algorithms in solving MOPs operate in the objective space. One of the well-known frameworks for MOEAs using decomposition is MOEOA/D [8], [16], [33]. It uses scalarization to decompose the MOP into multiple scalar optimization subproblems and solve them simultaneously by evolving a population of candidate solutions. Subproblems are solved using information from the neighboring subproblems [21]. Many variations have been proposed such as GWASFGA [29], an evolutionary algorithm based on achievement scalarizing function using the Tchebychev method. A recent approach called CDG [1] is also a decompositionbased MOEA. Instead of using a traditional scalarization method such asTcheybycheff, CDG-MOEA uses a constrained decomposition with grids.
We are interested in tackling MOPs using a decomposition of the decision space. In our previous work, we proposed a new metaheuristic for single objective optimization called Fractal Decomposition Algorithm (FDA) [23] which is based on hyperspheres Fractals. It is a deterministic metaheuristic developed to solve large-scale continuous optimization problems. As pointed out, the main principle of the approach consists of dividing the feasible search space into sub-regions with the same geometrical pattern. Hyperspheres were chosen as geometrical form because it has the benefit of scaling easily when the dimension of the problem increase. In this article we propose two algorithms that extend FDA to solve MOPs: • Scalarization-based approach (Mo-FDA-S): Mo-FDA-S adapts FDA using scalarization techniques. This approach has also been developed to benefit from a multi-node parallel environment to improve the computational time taken to solve MOPs problems. This chosen architecture benefits from containers, light-weight virtual machines that are designed to run a specific task. . The paper is organized as follow. Section II defines multi-objective optimization. Section III recalls the main principles of the Fractal decomposition metaheuristic FDA. Then, the scalarization-based FDA algorithm is presented in section IV. The dominance-based FDA algorithm is detailed in section V. In section VI the experimental settings and computational results against competing methods are detailed and analyzed. Finally, the section VII concludes and presents some future works.

Definition 1 (MOP):
A multi-objective optimization problem (MOP) may be defined as: where k (k ≥ 2) is the number of objectives, x = (x 1 . . . , x n ) is the vector representing the decision variables, and X represents the set of feasible solutions associated with equality and inequality constraints, and explicit bounds.
) is the vector of objectives to be optimized. In this article, we considered the case of continuous optimization: the decision variables used in the objective functions are required to be continuous variables (X ⊆ R).
The set of all values satisfying the constraints defines the feasible region X and any point x ∈ X is a feasible solution.
As mentioned before, we seek for the Pareto optima.
This definition states that x * is Pareto optimal if no feasible vector x exists which would improve some criterion without causing a simultaneous worsening in at least one other criterion.
Definition 5 (Pareto Front): For a given MOP f ( x) and its Pareto optimal set P * , the Pareto front is defined as PF * = { f ( x), x ∈ P * }. Definition 6 (Reference Point): A reference point z * = [z 1 , z 2 , . . . , z n ] is a vector which defines the aspiration level (or goal) z i to reach for each objective f i .

III. FRACTAL DECOMPOSITION ALGORITHM: A RECALL
The Fractal Decomposition Algorithm [23] (FDA) is a divideand-conquer based algorithm that has been designed to solve large-scale single objective continuous optimization problems. FDA builds a search tree of promising optimum areas of a depth k (called Fractal depth), by dividing the search space recursively using geometrical hyperspheres. Once FDA divided the search space using hyperspheres as an elementary geometric form, then, a search tree is built. While, navigating through the tree FDA identifies, at each decomposition level, candidate hyperspheres: areas where the global optimum could be found. This principle is illustrated in Fig. 1 with in case of a four-level decomposition, the red hypersphere being the best one at each level. Once the maximum fractal depth k is reached, a local search is triggered into promising hyperspheres to find the optimal solution.
Three main phases compose the algorithm: 1) The initialization phase; 2) The exploration phase and 3) The exploitation phase. The first phase aims to initialize the biggest hypersphere possible at the center of the search space and within its boundaries (Upper and lower bound). Once the first hypersphere is set, FDA starts the exploration phase. The algorithm will decompose the first hypersphere into 2 × D sub-hyperspheres, and apply the promising hypersphere selection procedure to all of them. It was designed to find the most promising regions. To do so, their attractiveness is approximated as in [23]. Then, all sub-hyperspheres are sorted by their scores, and the best one is selected to be further decomposed. The whole procedure is called at each level until the maximum fractal depth k is reached.
The exploitation phase starts when FDA has reached the maximum depth k. The Intensive Local Search (ILS) starts to explore intensively the sub-hyperspheres. ILS starts at the center of each sub-hypersphere and moves along each dimension sequentially, evaluating two solutions x s 1 and x s 2 as expressed in (2) and (3), respectively.
where e i is the unit vector where the i th element is set to 1, and other elements to 0. ω is the step-size in which e i changes. The best solution among x s , x s 1 and x s 2 is chosen to be the next current solution x s , then, moves to the next dimension. Once all dimensions have been exploited, if no improvement has been found on the current best solution found, the ω is reduced by a factor 1 λ . ILS stops where either the stopping criterion is reached or the ω has reached the tolerance threshold ω min . Once all the sub-hyperspheres have been visited by ILS, FDA stops if the stopping criterion was reached or backtracks in the search tree and select the next sub-hypersphere of the level k − 1 creating a new branch in the search tree. The whole algorithm is presented in Algorithm 1. The parameters of FDA are: the Fractal depth k = 5, the coefficient step size λ = 0.5, the inflation coefficient α = 1.75 and the tolerance threshold ω min .

IV. SCALARIZATION-BASED FRACTAL DECOMPOSITION
The aggregation (or weighted) method is one of the most popular scalarization method for the generation of Pareto optimal solutions. It consists in using an aggregation function to transform a MOP into a single objective problem (MOP λ ) by combining the various objective functions f i into a single objective function f generally in a linear way: where the weights ω i ∈ [0..1] and k i=1 ω i = 1. The first proposed scalarization approach Mo-FDA-S (Multi-Objective Fractal Decomposition Algorithm Scalarization) uses the Tchebychev function [22]. It introduces the concept of ideal point or reference point z * i as follows: where z * = (z * 1 , . . . , z * k ) is the reference point, and ω = (ω 1 , . . . , ω k ) is the weight vector.
By using N different weight vectors ω, Mo-FDA-S solves N different problems, each generating one solution composing the final Pareto Front (PF). One of the downsides of using scalarization methods is that the number of solutions composing the PF found by the algorithm will be, at most, the same as the number of different weight vectors N. In certain cases, if two or more weight vectors ω are too close, the algorithm might find the same solution.

A. IMPLEMENTATION ON PARALLEL ARCHITECTURE
This approach take profit intrinsically from parallel architectures. Indeed, the algorithm is launched N times with N variations of the weight vector ω. If we consider that the number of function evaluations (FE) is used as a stopping criterion, even though, each instance only has Max fe N FE, the computational time can increase significantly. To overcome this, a multi-node architecture has been developed for Mo-FDA-S. The idea is to have each node finding one solution corresponding to one combination of the weights ω and combine all their results to build the final Pareto front. Go to next level: l = l + 1 end end Result: the best solution BestSol and its coordinates The challenge behind this architecture is that the computing resources needed increase with the size of the Pareto-Front. For instance, if N = 100, it means that 100 nodes would be required, hence 100 different computers (or virtual machines), which can be seen as an oversized architecture. To tackle this important issue we have decided to develop the approach using containers and specifically the powerful combination of docker as the container technology with kubernetes as an orchestrator as shown on Fig. 3. Containers are significantly lighter than virtual machines as they all share the same operating system kernel. This way, a single machine can host more containers than virtual machines. This architecture is significantly lighter than a traditional one and allows to benefit from multi-node approaches while developing it on a limited number of hosts. In addition, containers can be deployed on multiple different physical (or virtual) machines seamlessly, without having to change the structure of our algorithm. Kubernetes is the leading open-source solution for container-orchestration and takes care, in our case, of the creation and deployment of all the containers on the different hosts without changing anything in the algorithm implementation. Table 1 and Fig. 2 show the obtained computation time. This example considers the time to solve a function in dimension D = 30 with 100 different weights vectors ω on different number of virtual hosts N. It is important to indicate that even on two hosts, computation gain can be observed, however not as important as when the number of hosts increases. This is because each host has to handle, in this case, 50 different containers. Moreover, when N increases significantly, the gain in time is significant compared to the sequential version but at some point, the increase in compute nodes does not decrease the computational time. This is due to the communication overhead required to synchronise all nodes and gather all points compositing the PF. All tests have been done on a cluster of machines with the following characteristics: one 3.1 GHz Intel Xeon R Platinum 817 processor with 16GB of RAM. Moreover, Mo-FDA-S has been developed in Python and uses the library MPI (Message Passing Interface).

V. DOMINANCE-BASED FRACTAL DECOMPOSITION
The idea behind Mo-FDA-D (Multi-Objective Fractal Decomposition Algorithm Dominance Based) is to keep the structure of the framework provided by the original version of FDA [23], i.e. the geometric fractal decomposition as well as the geometric form and the different phases composing the original version of FDA. The procedure to evaluate hyperspheres and the ILS heuristic to conduct the local search at the maximal fractal depth k have been extended to solve MOPs.

A. MULTI-OBJECTIVE HYPERSPHERE SELECTION (EXPLORATION STRATEGY)
This procedure aims to select the most promising region that might contains Pareto solutions. The aim is to find both the most promising region to be further decomposed but also to find potential non-dominated points composing the final Pareto Front (PF). To do so, we evaluated multiple points along each dimension using the following equations: where C (l) is the coordinates of the center of hypersphere being evaluated, r is its radius, γ ∈ [1, 3] and e i is the unit vector at the dimension i. This is illustrated by 4 with a twodimensional example by two different scenarii: • (a) 2 × D points plus the center are evaluated with γ = 1, meaning that the points are on the sphere.
• (b) 6 × D points plus the center are evaluated with γ = {1, 2, 3}. Solutions are on and within the hypersphere. All evaluated solutions are stored in a temporary list to be sorted. The sorting is based on the Pareto dominance and only non-dominated points are kept, producing a local Pareto front of locally non-dominated solutions within the hypersphere. The sorting algorithm used to sort evaluated points and generate the local PF is called Simple Cull [11]. Once the local PF obtained, all points are compared to the Nadir point of the objective space and all points above are excluded from the PF.
We have used the hypervolume indicator as a quality indicator to evaluate the sub-hyperspheres. To do so, we compute the hypervolume of the local PF with regards to the Nadir point, z nad [26]. The hypersphere with the highest value is considered better than the other hyperspheres of the same level and will, therefore, be selected to be further decomposed. Once all the hyperspheres of a given level have been evaluated, all the locally non-dominated sets are concatenated and sorted again to find the global PF of non-dominated solutions.

B. MULTI-OBJECTIVE INTENSIVE LOCAL SEARCH
Once the fractal depth k is reached, the intensive local search (ILS) is triggered. In this context, it is important to notice that any multi-objective local search or metaheuristic could be used at this step. Instead of searching locally within hyperspheres of the last level, ILS iterates around each nondominated solutions found so far during the exploration phase of evaluating hyperspheres. Therefore, the entry point of one ILS instance is one solution of the current global Pareto Set.
ILS starts by creating two empty lists, one for the Pareto Set listNewPS (decision space) and one for their corresponding solutions in the Pareto front denoted listNewPD (objective space) and insert in the first list the point given as input parameter. Then for each dimension and each point in listNewPS, ILS will produce two additional points denoted x L and x R . They stand in opposite directions from the current point being exploited, x C at equal distance ω, also called step size as per the following equations: where e i is the unit vector which the i th element is set to 1 and the other elements to 0, x C , x L and x R are then evaluated and inserted in listNewPS and their corresponding solutions in the objective space, respectively F( x C ), F( x L ) and F( x R ) are added to the listNewPF list. Once all the points in listNewPS have been exploited for the current dimension, the non-dominated sorting algorithm is applied to the list of all potential solutions (listNewPF) and this will generate a new local Pareto Front of non-dominated solutions. Hence listNewPF only contains a set of non-dominated solutions and listNewPS only contains their equivalent in the decision space. At the end of each iteration, once all dimensions have been searched, ω is multiplied by a coefficient defined as a hyperparameter of Mo-FDA-D. This is repeated until either: • the stopping criterion is reached; • or ω has reached its minimum value ω min and therefore ILS moves on to the next point in the Pareto set. Once ILS has finished searching around each point of the Pareto set, all points found during ILS are sorted and only the non-dominated points will remain and will compose the new global Pareto-Set. Either the stopping criterion is reached and Mo-FDA-D has finished or the backtracking procedure is applied and a new sphere from the level k -1 is selected to be decomposed. The whole procedure is illustrated in Algorithm 2.

VI. PERFORMANCE ANALYSIS
In this section, the two proposed algorithms, Mo-FDA-S and Mo-FDA-D, are analyzed and their performance is assessed using different benchmarks (ZDT [34], DTLZ [4], Fonseca-Fleming [9]). In the literature different metrics have been developed to measure the quality of the Pareto sets obtained by different algorithms [28]. Each metric measures a different characteristic of a Pareto front [27]: • Convergence (or accuracy), i.e. the closeness (i.e distance) to the optimal or best known Pareto front; • Cardinality, i.e. the number of points in the Pareto front; • Diversity, i.e. the distribution of the front. The points in a Pareto front should be well spread over the objective space. We have chosen to focus on the four most commonly used metrics evaluating all aspects of a Pareto front: • The Hypervolume computes the volume of the objective space that is dominated by a reference point [35].
• The Generational Distance metric (GD), computes the average distance from the obtained set to the true Pareto front [28].  • The Spread measures how well spread the nondominated solutions are over the objective space [6].
The following sections carry out the sensitivity analysis of the proposed multi-objective FDA algorithms to their parameters, and a comparison to some popular multi-objective evolutionary algorithms (e.g. NSGA-II, MOEAD/D).

A. IMPACT OF THE SCALARIZATION METHOD
As Mo-FDA-S is based on the original version of FDA, the sensitivity analysis with regards to its parameters can be found in [24]. However two scalarization methods have be used to study their impact: the Weighted Sum and the Tchebychev methods. The stopping criterion has been set VOLUME 8, 2020  up to 5000 × D and 10 − 5 as precision tolerance ω min . Using the Hypervolumes in Table 2, it is obvious that the Tchebychev method much more efficient than the aggregation method. The same results has been obtained for all ZDT, DTLZ, and Fonseca-Fleming benchmarks. As an illustration, Fig. 5 shows the obtained results for the ''Fonseca-Fleming'' benchmark for different dimensions [9]. Besides, it is interesting to note that Mo-FDA-S works well at low dimensions  and high dimensions but is less efficient on intermediate dimensions.

B. IMPACT OF DOMINANCE BASED EXPLORATION AND EXPLOITATION
The dominance based exploration and exploitation of hyperspheres Fractals is analyzed with regards to three parameters: the number of solutions evaluated in the hyperspheres nb, the fractal depth k, and the step size λ by which ω is multiplied in ILS. It is important to highlight the fact that in the different cases, the common criteria are the number of function evaluations set up to 5000 × D as stopping criterion and 10 − 5 as precision tolerance ω min .
Using DoE (Design of Experiments) methodology, 8 scenarios of values have been selected for this set of parameters. All those scenarios are illustrated in Fig. 6 and results are shown in Table 3 for a small problem (i.e. D = 2). For small problems, ILS tends to concentrate the Pareto front around one some area and therefore penalizes the diversification of the Pareto front. Going to deep hyperspheres (i.e. k = 16) will decrease the performances of the algorithm in terms of quality, whether ILS is used or not. However, using ILS increases significantly the computing time. Parameters values maximizing the quality (i.e. hypervolume) lead to an increase in the computing time and vice versa. For small size problems, the best configuration for the set of parameters is (nb = 6, k = 8, no ILS).
To analyse the results for large scale problems, the selected benchmark ZDT and DTLZ are scaled to dimension D = 30. The Pareto sets of the different cases are shown in Figure 7 and the quantitative results are shown in Table 4. Those results highlight the important fact that the use of ILS becomes essential when the dimension of the problem increases.

C. COMPARISON WITH STATE-OF-THE-ART ALGORITHMS
A comparison of our developed approaches has been carried out with some popular multi-objective evolutionary VOLUME 8, 2020  algorithms (MOEAs). We have considered the 5 following MOEA algorithms: • NSGA-II: a MOEA based on a non dominated sorting approach [6].
• NSGA-III: an extension of NSGA-II adapted to solve many-objective problems, i.e. more than 3 objectives. It works with a set of supplied or predefined reference points aiming to maintain the diversity among population members [5].
• MOEA/D-DE: a decomposition-based approach which uses scalarization to transform the MOP into a singleobjective problem [17]. The different scalar optimization sub-problems are optimized simultaneously. The Tchebychev method is used as a scalarization method. • GWASFGA: a global weighting achievement scalarizing function genetic algorithm [29]. This algorithm is also based on a scalarization method and uses an achievement scalarizing function which is based on the Tchebychev method but includes the use of the Utopian and the Nadir points.
• CDG: a decomposition-based MOEA. Instead of using a traditional scalarization method such as Tchebychev, CDG-MOEA uses a constrained decomposition with grids [1]. One objective function is selected to be optimized while the other objective functions are converted into constraints by setting up the upper and lower bounds. All experiments on the competing algorithms have been done using the framework jMetal 5.0 [7], [25]. Settings for the algorithms have been set according to [15] as well as  default values in jMetal [7]. Population size has been set to N = 100 and the stopping criterion Max F ES = 300, 000. As the competing algorithms are stochastic, their results have been averaged over 20 independent runs. As a recall, both Mo-FDA-S and Mo-FDA-D are deterministic algorithms and their results have been obtained after a single run.
As mentioned earlier, we have decided to use a set of 8 functions, 5 from the ZDT family problems and 3 from the DTLZ. The dimension set is D = 30 for both benchmarks. To compare the results obtained by the different algorithms, we used the Friedman Rank sum method to rank the approach based on their performance on each metric and each function. VOLUME 8, 2020

1) TWO-OBJECTIVE FUNCTIONS
First we have focused on both the ZDT and DTLZ benchmarks using 2 objective functions. Results from the competing algorithms are shown in Tables 5 to 8. On each  table the values in bold highlight the best algorithm for the given function and the given metric and the value in parentheses represents the rank for the given function. As a recall, the standard deviation for Mo-FDA-S and Mo-FDA-D are equal to zero as they are deterministic algorithms. Mo-FDA-D is regularly ranked first on the first three metrics    but last on the fourth metrics. This means that Mo-FDA-D finds good Pareto fronts, close to the true Pareto front but the solutions are less spread that the other algorithms. Regarding the other approach, Mo-FDA-S shows more stability over all the metrics.
Final ranks are shown in Fig. 8. Table 10 shows the final rank based on the values found in Table 9. This is also illustrated in the Fig. 8. This data shows that Mo-FDA-D is the best algorithm on three metrics, i.e. the Hypervolume, the GD and IGD. However, it performs the worst on the Spread metric. This shows a lack of diversity in the Pareto   front compared to other algorithms. However, Mo-FDA-S is complementary to Mo-FDA-D where it performs well on the Hypervolume and GD and outperforms the other methods on the Spread. This means that scalarization allows finding a well Pareto front with good diversity. Those conclusions can be seen in Fig. 9 corresponding to the different Pareto sets obtained via our algorithms on 4 selected functions.

2) MANY-OBJECTIVE FUNCTIONS
A comparison has been realized for many-objective problems composed of 3 objectives (DTLZ). Results from the competing algorithms are shown in Tables 11 to 14. On each table VOLUME 8, 2020    the values in bold highlight the best algorithm for the given function and the given metric and the value in parenthesis represent the rank. Mo-FDA-SD does not perform as well on 3-Objective functions. However, on each function, it outperforms the other algorithm on at least one metric. Final ranks are shown on Table 10 based on the values found in Table 9. Fig. 10 shows that Mo-FDA-D is the best algorithm for the GD metric. This means that the points found by Mo-FDA-D are closer to the true Pareto front than the other algorithm. It is ranked second on the IGD and similarly to the 2-Objective function, struggle to perform on the Spread. The best algorithm overall is NSGA-III as it has been adapted to manyobjective problems. Our algorithm, Mo-FDA-D is, overall, ranked second in the studied metrics, which shows promising results. Those conclusions can be seen in Fig. 11 representing the Pareto sets of our algorithms the four DTLZ 3-objective functions.

VII. CONCLUSION AND PERSPECTIVES
In this article, we have proposed new deterministic decomposition approaches to solve MOPs. The decomposition approach is based on geometrical fractal using hyperspheres. The exploration an the exploitation phases of the FDA algorithm have been extended to solve MOPs. Parallel scalarization and dominance based approaches have been developed. A comparison with state-of-the-art MOEAs shows that the proposed algorithms are very competitive in terms of the quality of the obtained Pareto fronts.
The main issues of the Fractal Decomposition Algorithms when handling MOPs are the hypersphere selection (exploration phase) and the Intensive Local Search. In this article, low complex and efficient strategies have been investigated. However, more sophisticated strategies could be designed to deal with those two important issues.
Furthermore, in Mo-FDA-D the hypervolume is used as a selection indicator for the hyperspheres. One can investigate the Spread metric to both improve the hypersphere selection as well as the best solutions to search around using ILS.
A massively parallel implementation on heterogeneous architectures composed of multi-cores and GPUs is under development to take profit from FDA added value: intrinsically parallel. Besides, we will investigate the adaptation of the algorithms to large scale MOPs such as the hyperparameter optimization of deep neural networks.

APPENDIX DETAILS OF SELECTED BENCHMARK
This appendix contains the functions and the pareto fronts of the considered benchmarks. Table 17 shows the ZDT problems and Fig. 12 their true pareto fronts. Table 18 shows the DTLZ problems, Fig. 13 their true pareto fronts in 2D and Fig. 14 their true pareto fronts in 3D.