Graph-Based Deep Decomposition for Overlapping Large-Scale Optimization Problems

Decomposition methods play a critical role in cooperative co-evolutionary algorithms (CCEAs) for solving large-scale optimization problems. Although some well-performing decomposition methods have been designed based on the interactions among variables (IaV), their grouping accuracy is still limited due to the poor performance on the overlapping problems and the computational roundoff errors of IaV in the implementation. To deal with these limitations, a graph-based deep decomposition (GDD) method is proposed to obtain more accurate grouping results, especially for the overlapping problems. On the one hand, the GDD mines the IaV information and obtains the minimum vertex separator of the interaction graph of variables, so as to group variables deeply and recursively. On the other hand, the GDD has the ability of fault tolerance to deal with the computational roundoff errors of IaV and can improve the grouping accuracy. For better experimental studies of overlapping problems, a novel overlapping function generator is designed with the random and complicate overlap type, and two new metrics are proposed to evaluate the grouping accuracy. Comprehensive experiments show that GDD can greatly improve the grouping accuracy and help CCEAs perform better than other existing algorithms, especially on the overlapping problems. In addition, the GDD is highly fault tolerant and can divide problems accurately even on the inaccurate IaV.


I. INTRODUCTION
W ITH data growing explosively, large-scale optimization problems (LSOPs) have aroused increasing attention and become a hot research topic in many systems engineering fields [1], [2], [3], such as the multiobjective optimization of large-scale capacitated arc routing problems [4], the constrained optimization of the large-scale power system [5], and the large-scale optimization of the supply chain system [6], [7]. Compared with the traditional optimization problems, a much larger number of decision variables need to be optimized in the LSOPs. In this case, the increase of the problem size causes it difficult to obtain the global optimal solution within the limited computation resources, e.g., within the maximal number of fitness evaluations [8], [9], [10].
The second category of EAs, usually called cooperative co-evolutionary algorithms (CCEAs), decomposes the LSOPs into several subproblems and then solves subproblems by different subpopulations [30], [31]. The decomposition of LSOP can reduce the search space of each subproblem and improve the search efficiency of CCEAs [32]. The most important part of CCEAs is the decomposition method that aims to put the interacting variables of the problem into the same group and divide the noninteractive variables into different groups. Each group is This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ corresponding to a subproblem. An excellent decomposition method can divide variables accurately and, therefore, improve the optimization performance of CCEAs for solving the LSOPs. Two main kinds of decomposition methods are widely used: 1) random grouping [33] and 2) differential grouping (DG) [34], [35], [36], [37], [38], [39], [40], [41], [42]. Normally, the grouping accuracy of DG is usually higher than random grouping [34], because the DG methods calculate the interaction among variables (IaV) of the LSOPs to help group the variables.
However, due to the complexity of the LSOPs and the computational roundoff errors of IaV, it is still difficult for the existing DG methods to decompose the LSOPs effectively, especially for the overlapping LSOPs. For example, in the complex overlapping LSOPs, there are overlaps among the ideal groups where these groups and their overlaps are called overlapping groups and overlapping variables, respectively (referred to Section II-A for definition). For example, in an overlapping function f (X) = (x 1 +x 2 +x 4 ) 2 +(x 1 +x 3 +x 5 ) 2 and X = (x 1 , x 2 , x 3 , x 4 , x 5 ) T , there are two overlapping groups (x 1 , x 2 , x 4 ) T and (x 1 , x 3 , x 5 ) T and an overlapping variable x 1 in the ideal case. The two groups actually can be treated separately as two overlapping groups, but the existing DG methods will group all the variables of the problem into a larger group (i.e., X = (x 1 , x 2 , x 3 , x 4 , x 5 ) T ), which deteriorates hampering the optimization efficiency of CCEAs for the LSOPs [38]. This is due to that the DG methods do not identify the overlapping variables and treat the overlapping groups as a whole group. Besides, the grouping accuracy of the existing DG methods is still limited by the accuracy of IaV. That is, IaV may sometimes be inaccurate (compared with the ideal IaV) due to the computational roundoff errors, e.g., grouping some noninteracting variables together. As a result, the inaccurate IaV of these DG methods will inevitably lead to the inaccurate grouping results. Therefore, to decompose overlapping LSOPs and deal with the inaccurate IaV, this article proposes a graph-based deep decomposition (GDD) method, which is crucial for obtaining more accurate groups to enhance the performance of CCEAs. GDD is inspired by the graph cut [43] that can divide a complicated graph into small subgraphs via the minimum vertex separator (MVS), since an LSOP with IaV can be regarded as a graph. On the one hand, the GDD uses IaV to deeply and recursively decompose the overlapping groups, where three rules are designed to help to determine the recursive decomposition. On the other hand, due to the decomposition ability on overlapping LSOPs, the GDD has the ability of fault tolerance for dealing with the computational roundoff errors and improving the final grouping accuracy.
In the GDD, IaV is first obtained by the existing DG methods, and a graph can be constructed according to IaV. Then, the connected components of the graph are corresponding to the initial groups of the problem. Since some of these initial groups may have overlaps, a recursive overlapping group decomposition (ROGD) method is proposed to divide overlapping groups based on the MVS of the corresponding graph. In ROGD, three rules are designed to help to determine the recursive decomposition, including how to deal with MVS (the first two rules) and isolated variables to construct the complete groups (the third rule). Each group is regarded as a graph, and the max-flow algorithm is used to obtain the MVS [44]. If the group belongs to an overlapping group, the MVS of the corresponding graph is equivalent to the overlapping variables, and the group can be decomposed by the MVS; otherwise, the MVS of this group is empty, and the group cannot be decomposed. Finally, the overlapping groups with too small sizes will be merged to save the fitness evaluations, and the final groups are obtained for the CCEAs.
GDD can not only decompose overlapping LSOPs but also has the ability of fault tolerance. Concretely, in a nonoverlapping LSOP, if independent variables in two groups are wrongly judged as interacting variables caused by the computational roundoff errors of IaV, the two groups will be regarded as a group in some existing DG methods [36], [37], which is not efficient for the CCEA optimization. However, such mistaken grouped variables can be easily separated by the overlapping variables (the misjudged interacting variables) in GDD. For example, in f (X) = (x 1 + x 2 + x 4 ) 2 + (x 3 + x 5 ) 2 and X = (x 1 , x 2 , x 3 , x 4 , x 5 ) T , there are two groups (x 1 , x 2 , x 4 ) T and (x 3 , x 5 ) T in the ideal case. However, if x 1 and x 3 are wrongly judged to be interactive in IaV, only a group (x 1 , x 2 , x 3 , x 4 , x 5 ) T will be obtained in some DG methods. Differently, in GDD, x 1 can be regarded as an overlapping variable, and the final groups are (x 1 , x 2 , x 4 ) T and (x 1 , x 3 , x 5 ) T , which are closer to the ideal case. Moreover, GDD is used after getting IaV and does not consume more fitness evaluations. The contributions of this article are presented as follows.
1) The GDD method is proposed to deeply and recursively decompose LSOPs via MVS, and three rules are designed for helping the recursive decomposition. This decomposition method can obtain more accurate grouping results, which is significant to improve the optimization efficiency of CCEAs on LSOPs, especially for the overlapping problems. 2) Due to decomposition ability on overlapping LSOPs, GDD is fault tolerant and can obtain the higher grouping accuracy on LSOPs. 3) A novel overlapping function generator is proposed with the random and complex overlap type. It can be used as a routine for generating different kinds of overlapping LSOP to test the grouping efficiency of decomposition algorithms. This is significant for further researches into overlapping LSOPs in the community. 4) Two new metrics are designed to evaluate the grouping accuracy, including the overlapping rate and the redundancy rate of the grouping results obtained by the decomposition algorithms versus the ideal grouping results. The remainder of this article is organized as follows. Section II introduces the existing decomposition methods for the LSOPs, the overlapping problems, and the MVS of the graph. Section III describes the details of the proposed GDD method. Section IV presents the experiments, including the analysis of the grouping accuracy and the fault-tolerance ability of GDD and the optimization efficiency of CCEAs combined with GDD. Finally, Section V gives a conclusion.

A. Overlapping Problems
In the overlapping problems, there are some subcomponents with overlap, but these subcomponents can be separated after dealing with the overlap. The overlapping problem can be defined as follows.
Definition 1: If arg min f (X) = (arg minf (X 1 , . . .), . . . , arg minf (. . . , X k )) and some of X i share variables, f (X) is an overlapping function and is separable with k nonseparable groups, where X = (x 1 , . . . , x D ) T is a decision vector with D dimensions, and X 1 to X k are subvectors of X. If X i and X j overlap, they are denoted as overlapping groups, and the shared variables of them are denoted as overlapping variables.
For example, the ideal groups of the overlapping problems f 13 and f 14 in IEEE Congress on Evolutionary Computation (IEEE CEC) 2013 are pairwise joint with the chain type, as shown in [45]. Therefore, the overlapping variables are the connections between the overlapping groups. In the existing grouping methods, such as DG [34], extended DG (XDG) [35], global DG (GDG) [36], DG2 [37], and recursive DG (RDG) [38], all the variables in the f 13 or f 14 will be regarded as interactive and will be merged in a larger group. In fact, the larger group can be efficiently separated by further dealing with the overlapping variables, although it is difficult to obtain the overlapping variables. Therefore, the GDD method is proposed in this article to identify the overlapping variables.

B. Decomposition Methods
The decomposition methods divide the variables of LSOPs into several groups. Let f (X) denote the objective function of the LSOP. The separability of f (X) can be defined as follows.
The interaction exists only in variables of the same group. However, it is difficult to identify IaV and group the variables accurately, although various decomposition methods have been proposed, such as random grouping [33], [46] and DG [34], [35], [36], [37], [38], [39], [40]. Random grouping updates groups in every iteration of CCEAs. It ignores IaV and attempts different random grouping strategies during the evolutionary process. Therefore, random grouping may result in the low grouping accuracy and influences the solving efficiency of CCEAs [46]. Different from random grouping, DG identifies IaV by calculating the difference of corresponding objective function values, but it only considers a part of direct variable interactions [34]. For example, if x 1 is interactive with x 2 , x 2 will be directly assigned to the same group of x 1 , and other interactive variables of x 2 will not be detected. There are two shortcomings of DG: the incompleteness of IaV and the ignorance of computational errors (e.g., the roundoff errors of the floating-point operations). Therefore, the XDG is proposed to identify indirect variable interactions and to get more IaV information than DG [35]. Afterward, the GDG takes the computational errors into consideration, and takes the LSOP as a graph to decompose the problem and obtain the complete IaV [36]. Furthermore, DG2 (an improved variant of GDG) groups variables in a more accurate way by setting a reliable threshold value, and has a higher calculation efficiency than GDG [37]. However, these grouping methods cannot solve the overlapping problems where vectors from X 1 to X k are not pairwise disjoint in Definition 2. Although GDG and DG2 get the complete IaV information, they will merge two groups that have overlapping variables. Therefore, the number of variables in the group may be still too large due to the mergence, resulting in the difficulty of the optimization of the LSOPs.
As the above variants of DG methods need a number of fitness evaluations to obtain the IaV information, the RDG is proposed to detect IaV by the binary search with less computation cost [38]. Moreover, RDG2 considers the computational errors and improves the grouping accuracy of RDG [39]. Based on RDG2, the RDG3 is designed to decompose the overlapping functions [40]. During the decomposition, if the size of a group is large, RDG3 will forcedly divide the variables of this group into different smaller groups. In this way, RDG3 can decompose the overlapping functions. However, the forced decomposition may also divide the interactive variables into different groups, which will break the independence among groups and affect the optimization efficiency of CCEAs. Besides, as RDG, RDG2, and RDG3 do not identify the interactions between each pair of variables, they cannot obtain the complete IaV.

C. MVS of the Graph
For a connected graph G, the removal of its MVS will disconnect G, and the size of its MVS is equal to the vertex connectivity of G which is denoted as κ(G) [43], [47]. Let V and E be the vertex set and the edge set of the graph G(V, E), respectively. If there exist k(1 ≤ k ≤ |V|) vertices whose removal makes G unconnected, and the removal of arbitrary (k-1) vertices does not disconnect G, then κ(G) is equal to k and the set of the k vertices is recorded as the MVS of G. As the vertices and the edges of the graph can be corresponding to the vertices, the edges, and the weights of edges of a network, it has been proved that the graph can be transformed to a network, and the calculation of the vertex connectivity can be transformed into the max-flow problem in the network [44]. Before calculating the maximum flow, the graph G needs to be transformed into the network N. To be more specific, as described in [44], each vertex v ∈ V in the graph G is corresponding to two vertices v and v and an edge (v , v ) with the weight of 1 in the network N. Each edge (u, v) ∈ E in the graph G is corresponded with two edges (u , v ) and (v , u ) with the weight of infinity (∞) in the network N. The corresponding network N of the graph G in Fig article uses the Dinic algorithm [48] to solve the max-flow problem. Then, the MVS can be obtained by traversing the residual network [49].

III. GRAPH-BASED DEEP DECOMPOSITION METHOD
The decomposition method that groups variables according to IaV is an effective approach for CCEAs to solve the LSOPs [34], [35], [36], [37], [38], [39], [40]. IaV can be represented by a (0, 1)-matrix [36], [37]. If (i, j) = 1, where i and j are two variables, it represents that i is interactive with j; otherwise, it represents that i and j are independent. Based on the IaV matrix , a graph can be obtained. Each variable of the problem is corresponding to a vertex in the graph. If two variables are interactive, there will be an edge connecting the corresponding two vertices; otherwise, there will be no connection between them. The nonseparable (i.e., interactive) groups are equivalent to the connected components of the graph [36], [37]. Separable variables are equivalent to the groups with a vertex.
In this section, the GDD method is proposed, inspired by the idea of the graph cut. The proposed method mines the interactive information among variables obtained by the DG methods to deeply divide variables more accurately, especially for overlapping problems.
Algorithm 1 shows the pseudocode of the GDD method. The input is obtained by the existing DG methods, and D is the dimension of the problem. The output Group is the set of the final nonseparable groups, and sep is the set of separable variables. The function ConnComp is used to obtain the set of connected components CNC in line 3. After removing sep from CNC, if CNC is not empty, the ROGD algorithm is carried out for dividing the overlapping groups (nonseparable groups with overlapping variables) efficiently. After the ROGD, an adjustment strategy (Adjust) is carried out to merge too small groups that have overlapping variables.

A. Recursive Overlapping Group Decomposition
The ROGD algorithm aims to divide overlapping groups (connected components) into smaller groups. It can improve the decomposition efficiency of the overlapping LSOPs and help CCEAs to solve problems more effectively.
If each connected component (a nonseparable group) is regarded as a graph, it can be divided into several components after the removal of MVS (described in Section II-C). In overlapping groups, MVS can be regarded as the overlapping variables. Let S denote the vertex set of the MVS of a nonseparable group G, and let U denote the graph after removing S from G. If U can be separated into two connected components (U and U ), as shown in Fig However, after removing the MVS, if the size of the new connected components (i.e., U and U ) is still too large, these components need to be further divided. Therefore, the ROGD algorithm is designed in a recursive way to divide an overlapping group into smaller groups with appropriate sizes. In the ROGD, the termination of the algorithm and the completeness of grouping (the complete groups include all variables of the problem) are described as follows.
1) Termination of ROGD: For a group G, if the number of its vertices |V| is too small or the size of its MVS (|S|) is too large, G will not be separated any more. Therefore, in the proposed ROGD algorithm, if |V| ≤ D/α or |S| ≥ |V|/β, where D is the dimension of the problem, the recursive decomposition of the group G will terminate.
2) Completeness of Grouping: To describe the ROGD algorithm clearly, the decomposition process is regarded as the tree structure. Each tree node represents a group or a separable variable, and leaf nodes represent groups that are not divided any more. It should be noted that a tree node is also corresponding to a graph.
For the completeness of grouping, not only the intergroup independence but also the intragroup interaction should be satisfied. That is, for the variable v in the group G (v / ∈ the overlapping components), all its interactive variables should be added into G. Therefore, a complete group should include the MVS of its ancestor nodes (denoted as MVS_anc), because the MVS_anc is interactive with some variables of this group. In this way, the groups are also independent after ignoring the overlapping variables. For example, the decomposition of the graph G in  [45] is also given in Section S-II of the supplementary material as an example.
There are two main parts of the recursive decomposition. The first part is to use the breadth-first search (BFS) [50] to implement the recursive decomposition. A group will be divided recursively, until it satisfies the termination condition of ROGD. For example, in Fig. 1(b), G 1 is divided into G 2 , G 3 , and G 4 . G 2 and G 4 do not need to be divided any more, but G 3 is further divided into G 5 and G 6 by removing S 3 . G 2 , G 4 , G 5 , and G 6 are the leaf nodes. The second part is to add the MVS to groups for the completeness of grouping. Herein, three points should be taken into consideration, and the three corresponding rules are designed as follows.
1) First point is which MVS to be added.

Rule 1: A node (group) G only adds the MVS_anc that interacts with its vertices V.
Explanation: This rule is to avoid blindly adding all MVS_anc into G. There may be two cases. The first case is that the MVS of its father node is certainly be added, since this MVS is the minimum set of vertices to separate its father node and is certainly interactive with V. The second case is that the MVS of its grandfather nodes or more old ancestor nodes will be added if and only if the MVS interacts with V. For example, in Fig. 1(c), G 5 and G 6 add the MVS (i.e., S 3 ) of their father node (i.e., G 3 ) to form the final groups. However, for the node G, not all the MVS_anc is interactive with V, and only the interactive MVS_anc with V is added into G. For example, for G 5 in Fig. 1(b), only S 3 ({6}, the MVS of its father node G 3 ) is interactive with the vertices (2 and 5) in G 5 . Therefore, only S 3 is added into G 5 , but S 1 (the MVS of its grandfather node G 1 ) is not added (as S 1 has no interaction with G 5 ).
2) Second point is when to add the MVS of the group G into nodes to construct complete groups, such as after dividing G at the first time or after finding all the leaf nodes of G.
Rule 2: The MVS of the group G is added after the whole decomposition to construct the final groups.
Explanation: If the MVS is added during the decomposition of G, it will be involved in the following decomposition of G, which may increase the repetition of the MVS. However, if the MVS is added after the whole decomposition, it will be only involved in the leaf nodes of G, which helps to accelerate the decomposition process. The depth-first search (DFS) [51] is used to find the MVS_anc of a node. For a leaf node, all MVSs of its ancestor nodes are traversed and judged whether to be added or not. Fig. 1(b) and (c) shows the example of adding the MVS after finding all the leaf nodes. After removing the MVS of G 1 (S 1 ), G 1 can be decomposed into G 2 , G 3 , and G 4 , and G 2 and G 4 are leaf nodes. After removing the MVS of G 3 (S 3 ), G 3 can be divided into two leaf nodes G 5 and G 6 . For the leaf nodes G 5 , the MVS_anc is {S 1 ∪ S 3 }, and only the interactive MVS_anc (S 3 ) is added into G 5 to form the final group G 9 in Fig. 1(c). Similarly, S 1 is added into G 2 and G 4 to form G 7 and G 8 , respectively, and S 1 and S 3 are added into G 6 to form G 10 . The final groups in Fig. 1(c) include G 7 , G 8 , G 9 , and G 10 . Fig. 1(d) shows the example of adding the MVS after dividing the group for the first time. Different with the decomposition in Fig. 1(b) and (c), after removing S 1 and dividing G 1 , S 1 is added into the decomposed parts to form the nodes G 2 , G 3 , and G 4 . Similarly, after removing S 3 and dividing G 3 , S 3 is added into the decomposed parts to form the nodes G 5 , G 6 , and G 7 . The final groups in Fig. 1(d) include G 2 , G 4 , G 5 , G 6 , and G 7 . G 7 is redundant in the decomposition in Fig. 1(d), since it includes the vertices 6 and 8 which both have appeared in other final groups (G 2 , G 4 , G 5 , and G 6 ).
3) Third point is how to deal with the isolated vertices after removing the MVS of the group G, such as the vertex D in Fig. S-1 of the supplementary material. To construct the tree structure of ROGD, a tree node (Tnode) which represents a group g should include the variable set of g (v), the MVS (overlapping variables) of g (ovp), its parent node (parent). The initialization of a tree node based on group g is shown in Algorithm 2.
The DFS_MVSanc algorithm is to find the interactive MVS_anc with the current traversing node, as shown in Algorithm 3. For the input parameters, TNode is the set of tree nodes. Tnode is the current traversing node.
is gotten from the DG methods. For the output parameters, MVS includes the MVS_anc that is interactive with the variables of Tnode. All 3) Complete ROGD: The details of ROGD are shown in Algorithm 4, where CNC is the set of connected components (the initial groups) calculated from the IaV matrix , and D is the dimension of the problem, and Group is the set of final groups. The function getMVS in line 13 obtains the MVS by the Dinic algorithm [49] and the traversal of the residual network (as shown in Section II-C).
TNode records the set of tree nodes, and queNode records the queue of indices of tree nodes (TNode) traversed by BFS. At the beginning, the groups (tree nodes) in CNC are added to TNode, and the corresponding indices are added to queNode, in lines 5-8. For example, in Fig. 1(b), CNC = {G 1 } and, therefore, TNode 1 is G 1 and queNode 1 is 1. Then, BFS is used to traverse all tree nodes in lines 10-30. In BFS, getMVS is used to find the MVS (overlapping variables) of the current traversing node Tnode (the node in TNode with the index queNode h ) for judging whether the node can be divided or not. If the condition in line 14 (the termination of ROGD, including the minimum size of groups D/α and the maximum size of MVS |Tnode.v|/β) is satisfied, Tnode is a leaf node and does not need to be divided again. After adding the MVS_anc (lines [15][16][17], Tnode is regarded as a final group and is added into Group (line 18). If the condition in line 14 is not satisfied, it represents that Tnode needs to be divided again (lines 20-30). After removing the overlapping variables (Tnode.ovp), Tnode is divided and Children are the decomposed groups (lines 20 and 21). If there are separable variables (seps) in Children, seps will be taken as a final group after adding the interactive MVS_anc and be removed from Children (lines [23][24][25][26]. Then, all the children nodes in Children are added into TNode, and the indices are added into queNode (lines [27][28][29][30]. Afterward, the next tree node will be traversed. For example, in Fig. 1(b), group G 1 in CNC is added into TNode, and queNode = {1} in lines 5-8. In the first loop (lines 10-30), queNode h = 1 and the node G 1 is traversed. The tree nodes G 2 , G 3 , and G 4 (children nodes of G 1 ) are added into TNode, and queNode = {1, 2, 3, 4}. In the second loop, queNode h = 2 and G 2 is traversed. Because G 2 is a leaf node, no nodes are added into TNode. In the third loop, queNode h = 3 and G 3 is traversed. G 5 and G 6 (children nodes of G 3 ) are added into TNode, and queNode = {1, 2, 3, 4, 5, 6}. In the following loops, G 4 , G 5 , and G 6 are traversed, respectively, and no nodes are added into TNode, since G 4 , G 5 , and G 6 are all the leaf nodes and do not have children nodes.

B. Adjustment of Grouping
After the recursive decomposition in ROGD, if the number of groups is still small, it shows that the connectivity of the graph is strong. Considering the independence among groups in CCEAs, there is no need to divide the strongly connected graph. However, if the number of groups is big, such as f 12 in IEEE CEC 2013 [43] which has 496 groups after being decomposed by ROGD, it will consume lots of fitness evaluations in every iteration and shorten the evolutionary process of CCEAs (assumed that the terminating condition is the maximum number of fitness evaluations). To reduce the number of groups, a grouping adjustment method is proposed to merge small groups with overlapping components.
For a problem with D dimensions (a graph with D vertices), if the number of its final groups is more than D/α, small groups will be merged via their overlapping variables. Algorithm 5 shows the details of the grouping adjustment method, where Group is the set of final groups. First, all overlapping variables . Therefore, the number of variables that occur in same groups with the vertex 8 is least (MVS.V_num 2 = 6), and the groups (G 2 , G 4 , and G 7 ) that include vertex 8 will be merged.

C. Complexity Analysis
From the observation of Algorithm 1, it can be seen that the GDD method does not consume fitness evaluations. For the analysis of the time complexity, all components of GDD are needed to be analyzed, including ConnComp, ROGD (Algorithm 4), and Adjust (Algorithm 5) which are all implemented based on the IaV matrix . ConnComp is also used in GDG [36] and DG2 [37] to construct the initial graph, and it can be implemented by the Warshall algorithm [52] whose complexity is O(D 3 ).
It is assumed that there are N tree nodes and M final groups (leaf nodes) during the decomposition (M < N). In the worst case, such as the chain-shaped grouping result (f 12 in IEEE CEC 2013 [45], shown in In the best case, all groups (CNC) obtained in line 3 in Algorithm 1 are not needed to be divided again, such as f 4 − f 11 in IEEE CEC 2013 [45], and N = M = |CNC|. Without loss of generality, it is assumed that a function F, all the middle nodes have only an overlapping variable, and they are divided in half until the number of variables of the nodes reaches the lower bound D/α (termination of ROGD). The decomposition of F is shown in Fig. S-6 [49] in every loop where D is the number of variables of the current traversing node Tnode (line 12). Therefore, getMVS is executed N times in ROGD, and the total complexity of this part is equal to O(|G 1 | 3 + |G 2 | 3 + · · · + |G N | 3 ) where G 1 to G N are the tree nodes of the decomposition. We assume that the nodes from G N−M+1 to G N are the M leaf nodes (the final decomposed groups). For DFS_MVSanc (Algorithm 3), the operations between lines 4 and 5 in Algorithm 3 will be executed (|Tnode.v|×|V|) times, where V is the MVS_anc of Tnode. DFS_MVSanc is only executed for M leaf nodes in ROGD, and the total complexity of where V i is the MVS_anc of G i , and |G i | and |V i | are both smaller than D. ConnComp is only executed for each middle nodes (G 1 to G N−M ), and the complexity of this part in ROGD is O(|G 1 | 3 + |G 2 | 3 + · · · + |G N−M | 3 ). It can be seen that the complexity of getMVS in ROGD is higher than DFS_MVSanc and ConnComp. Therefore, the complexity of ROGD mainly depends on getMVS (O(|G 1 | 3 +|G 2 | 3 +· · ·+|G N | 3 )), denoted as O(ROGD).
In the worst case (f 12 For function F, the number of variables of each tree node is shown in Fig. S-6

IV. EXPERIMENTAL RESULTS AND ANALYSIS
In the comparative experiment, five benchmark functions from IEEE CEC 2013 [45] and 20 overlapping functions which are randomly generated are tested to verify the performance of GDD on decomposing the LSOPs. As GDD is based on the IaV matrix to deeply decompose the problem, it will be performed on the complete and high accurate obtained by GDG [36] and DG2 [37], resulting in the corresponding algorithms denoted as GDG_GDD and DG2_GDD, respectively. It should be noted that RDG2 and RDG3 are not combined with GDD, because they cannot get the complete IaV. Let I and A denote the ideal and the obtained by the algorithm A, respectively. I can be obtained by the method used in [37], with the source code available from https://bitbucket.org/mno/differential-grouping2/src/master/matlab/adjmatrix2013.m.
In the following experiments, the test suite is first introduced, including 20 new overlapping functions. Then, the grouping accuracy of the decomposition methods before and after employing GDD are analyzed. Afterward, the grouping efficiency and the fault-tolerance ability of GDD are verified. Finally, the GDD-enhanced decomposition methods are incorporated into the third version of contribution-based cooperative co-evolutionary algorithm (CBCC3) [53] to compare with some state-of-the-art large-scale optimization algorithms. The CCBC3 framework will first consider and optimize the subproblem which contributes more to the current improvement of the whole problem optimization.

A. Test Suite
The test suite includes five functions in IEEE CEC 2013 [45], such as f 7 , f 11 1, 2, . . . , 10). Except for P, other parameters of o 2i−1 are the same as f 13 , and other parameters of o 2i are the same as f 14 . In f 13 and f 14 , the grouping type is chain shaped (groups are pairwise joint) with a fixed overlap size [45]. Therefore, in order to increase the overlapping types of the tested problems, new functions are designed based on random overlapping variables and overlap sizes to enhance the diversity of the overlap type. For example, Fig. S-7 of the supplementary material shows another complicated overlap type rather than the chain-shaped. The generation of different types of overlapping functions is described in Section S-IV of the supplementary material, including the generation of vector P, differences between o 1 to o 20 and f 13 and f 14 , and their ideal grouping results. Datasets are available at https://github.com/zhangxin-Jancy/Benchmarks_for_overlappingLSOP.

B. Analysis of Grouping Accuracy
GDG and DG2 use three metrics ρ 1 , ρ 2 , and ρ 3 to measure the accuracy of identifying three types of relationships in IaV [36], [37]. However, these metrics are only related to the matrix and cannot evaluate the real grouping results of algorithms. Therefore, two new metrics are proposed in this article, including the overlapping rate (R ol ) and the redundancy rate (R rd ). Let Group I = {G I 1 , G I 2 , . . . , G I n } denote the ideal grouping result and Group A = {G A 1 , G A 2 , . . . , G A m } denote the grouping result obtained by a decomposition method A, where n and m are the number of groups in Group I and Group A , respectively. The ideal grouping results Group I can be obtained based on the parameters of the benchmark functions [45], as mentioned in Section II-A and Section S-III of the supplementary material. It should be noted that each separable variable is regarded as a single group to evaluate the identification of separable variables accurately. Before the calculation of R ol and R rd , the maximum matching of groups in Group I and Group A is obtained, denoted as Group I ∩ max Group A . The group G I i corresponds to the group G A j which has the most common variables with G I i . If groups G I i1 and G I i2 both correspond to the group G A j , G A j will choose the group with more common variables, and different G I i corresponds to different G A j . Fig. S-8 of the supplementary material shows an example of Group I ∩ max Group A , where G I 1 corresponds to G A 1 , and G I 2 corresponds to G A 2 or G A 3 and, therefore, The overlapping rate R ol is the ratio of the number of variables grouped in the right groups to the total number of variables in Group I (including the repetitive variables). The redundancy rate R rd is the ratio of the number of redundant variables to the total number of variables in Group A , where redundant variables are included in Group A but not in Group I ∩ max Group A . R ol and R rd are calculated as follows: where G A j i is corresponding to G I i in the maximum matching. If R ol = 100% and R rd = 0%, it represents that Group A is same as Group I . Besides, Group A with a bigger value of R ol and a smaller value of R rd is closer to Group I . Therefore, the larger R ol and the smaller R rd are better. If the Group I of the graph in Fig. 1(a) is {9, 10, 11, 7, 8}, {1, 2, 5, 6}, and {3, 4, 6, 7}, R ol and R rd of the grouping result in Fig. 1(c) are 92.3% and 25%, respectively, and the R ol and R rd of the grouping result in Fig. 1(d) are 92.3% and 29.4%, respectively. It also indicates that adding MVS after finding the leaf nodes (Rule 2) is helpful to decrease R rd of decomposition methods. With the use of R ol and R rd , the hyperparameter tuning is investigated in Section S-V of the supplementary material. Table I shows R ol and R rd of RDG3, GDG, GDG_GDD, DG2, and DG2_GDD. The last line represents the average results (Avg.). The bold data represent the best results among all algorithms. Moreover, the results with underlining are the better results between two variants of compared algorithms (e.g., GDG and GDG_GDD).
For the metric of R ol , if R ol of algorithm A is equal to 100%, it indicates that Group A ⊃ Group I . As shown in Table I, DG2_GDD obtains the highest the average R ol , and the R ol of it reaches 100% on 18 functions (24 functions in total). Following DG2_GDD, GDG_GDD can also obtain higher R ol than other algorithms. Because GDG and DG2 cannot divide the overlapping functions (f 13 , f 14 , and o 1 to o 20 ) and obtain accurate IaV (f 7 and f 11 ), there will be fewer groups to match Group I , and the average R ol of them is much lower than GDG_GDD and DG2_GDD, respectively. RDG3 will forcedly divide the variables of a larger group into different smaller groups, and break the intergroup independence and ignore the completeness of grouping (as mentioned in Section III-A). Therefore, although the average R ol of RDG3 is higher than GDG and DG2, it is lower than GDG_GDD and DG2_GDD.
For the metric of R rd , if the R rd of an algorithm is equal to 0%, it indicates Group A has no more variables than the common variables with Group I (Group I ∩ max Group A ). As shown in Table I, DG2_GDD followed by GDG_GDD can obtain the lowest average R rd , and the R rd of it can reach 0% on 16 functions. In addition, because the groups of GDG and DG2 are larger without decomposition, and there are more redundant variables after matching with the groups in Group I , the average R rd of GDG and DG2 is worse (i.e., higher) than RDG3, GDG_GDD, and DG2_GDD.
Based on the comprehensive analysis of the results of R ol and R rd , it can be concluded that GDG_GDD and DG2_GDD can obtain higher accurate groups not only in overlapping functions but also in partially separable functions (f 7 and f 11 ). As the accuracy IaV obtained by DG2 is higher than that obtained by GDG (i.e., DG2 is higher than GDG ) [37], DG2 and DG2_GDD can get more accurate groups than GDG and GDG_GDD, respectively. The grouping accuracy of RDG3 is higher than GDG and DG2, but lower than the methods combined with GDD.

C. Analysis of the Grouping Efficiency of GDD
From the observation of the grouping results, it can be seen that GDG and DG2 cannot divide the variables of overlapping components. After being combined with GDD, the average R ol of GDG and DG2 has an increase, and the R rd of them has a decrease. It indicates that GDD can help GDG and DG2 to get more accurate group on these functions, including overlapping functions f 13 and f 14 , o 1 -o 20 , and partially separable functions f 7 and f 11 . The reason is that GDD can find the overlapping variables (i.e., the MVS) among connected components (i.e., the overlapping groups) through the IaV matrix . Therefore, GDD can divide the overlapping functions accurately if A is close to the ideal I , such as DG2 on f 13 and f 14 [37]. In addition, if the independent variables are wrongly judged as interactive variables (i.e., I (i, j) = 0 but A (i, j) = 1), GDD can also help decomposition methods to overcome the wrong interactive information to get more accurate groups. That is because the groups of Group I including the variables i and j will be merged into a group G A of Group A , but G A can be decomposed after GDD removing the MVS To validate the effectiveness of Rule 2, GDG_GDD and DG2_GDD without this rule are tested, as shown in Table II. GDG_GDD and DG2_GDD without Rule 2 are denoted as GDG_GDD2 and DG2_GDD2, respectively. The last line of the table is the average value of R ol or R rd . The results in bold are the best results among all algorithms, and the results with underlining are the better results between two variants of compared algorithms (e.g., GDG_GDD and GDG_GDD2).
As shown in Table II, R rd of DG2_GDD2 is higher than DG2_GDD. The reason is that as described in Section III-A, if MVS is added after dividing a group for the first time (as in DG2_GDD2), MVS will always be involved in the following decomposition, and there will be more redundant variables in groups. On the other hand, the problem will be divided into more groups, and the number of variables in each group will decrease corresponding to the matched Group I . Therefore, R ol of DG2_GDD2 is lower than DG2_GDD. The differences between GDG_GDD2 and GDG_GDD are similar.
To verify the effect of Rule 2 on the solving efficiency of the LSOPs, GDG_GDD2, GDG_GDD, DG2_GDD2, and DG2_GDD are incorporated into CBCC3 which is sensitive to the grouping accuracy [33], [37], [53], and SaNSDE [54]  is chosen as the optimizer. The optimization results of them are shown in Tables S-I and S-II of the supplementary material, respectively. The maximum number of fitness evaluations is set to 300 0000. Each experiment is conducted for 25 times independently. In addition, the Wilcoxon rank-sum test at a 5% significance level is used for the statistical comparisons. The last column of the tables is the number of wins, ties, and losses of GDG_GDD or DG2_GDD against other algorithms. From the observation of Table S-I and Table S-II in the supplementary material, it can be seen that the optimization results of GDG_GDD and DG2_GDD are better than GDG_GDD2 and DG2_GDD2, respectively. In summary, the designed rules can not only prescribe how to get the complete grouping results after adding MVS, but also help to improve the accuracy of grouping and get better optimization results of the LSOPs.

D. Analysis of the Grouping Fault Tolerance of GDD
From the observation of the results mentioned above, it can be drawn that GDD helps decomposition methods to get more accurate groups not only on overlapping functions but on some nonoverlapping functions. It also shows that GDD has the capacity of fault tolerance, since the GDD decomposes the problems via the MVS deeply and recursively. Differently, the compared decomposition methods, such as RDG3, GDG, and DG2, directly obtain the groups without mining IaV for deep decomposition and, therefore, they are not fault tolerant.
Taking DG2_GDD as an example, the grouping accuracy of it partially depends on the accuracy rate of DG2 . To analyze the influence of the accuracy rate of DG2 on DG2_GDD, Table III shows the accuracy rate of DG2 and the grouping accuracy of DG2_GDD. ρ 1 , ρ 2 , and ρ 3 represent three different metrics of DG2 [36]. The data in italic type represents that the results reach to 100.00 after being rounded up. If ρ 1 = ρ 2 = ρ 3 = 100%, it represents that the accuracy rate of DG2 reaches 100%, and DG2 is the same as I . As shown in Table III, if DG2 is the same as I (ρ 1 = ρ 2 = ρ 3 = 100%), the grouping accuracy of DG2_GDD can also reach to 100% (R ol = 100% and R rd = 0%), such as on f 13 20 . Therefore, although the grouping accuracy of DG2_GDD is related to the accuracy rate of DG2 , DG2_GDD can also get the ideal grouping results with the inaccurate DG2 , which also proves DG2_GDD to be fault tolerant. The reason why GDD is fault tolerant is due to the decomposition ability on overlapping LSOPs. Specifically, there are three factors. First, GDD can identify the overlapping variables (MVS) among overlapping groups, and divides these groups into smaller groups. As shown in Section IV-C, GDD can get more accurate groups after dividing the wrongly judged overlapping groups which are merged in Group DG2 but are separated in Group DG2_GDD and Group I . For example, as shown in Table S-III of the supplementary material, Group DG2 merges G DG2_GDD 6 and G DG2_GDD 7 , and Group DG2_GDD on f 7 is closer to the corresponding Group I . Therefore, GDD can get more accurate groups. Second, the three rules, designed in Section III-A, help to add the overlapping variables to improve the grouping accuracy. Rule 1 and Rule 2 avoid adding redundant variables of overlapping components. These rules are helpful for the algorithms with inaccurate A to get more accurate groups. For example, although DG2 on f 7 , f 11 , and f 14 are inaccurate (as shown in Table III), DG2_GDD can still obtain the approximately ideal groups on these functions (R ol ≈ 100% and R rd ≈ 0%). Third, the graph connectivity also helps to improve the fault-tolerance ability of GDD. For example, (1, 2) = 1 and (1, 3) = 1, variables 1, 2, and 3 will be assigned to a group whether (2, 3) = 1 or (2, 3) = 0. For example, I and DG2 on f 14 are different (e.g., I (106, 225) = 1 and DG2 (106, 225) = 0). The index of variables starts from 0), but Group DG2_GDD is same as Group I .
Comparing the results of CBCC3-DG2 and CBCC3-DG2_GDD, it can be seen that CBCC3-DG2_GDD is better than CBCC3-DG2 on all test functions with the significant difference, except for f 11 where the grouping result of DG2_GDD is same as DG2. It should be noted that CBCC3-DG2_GDD performs much better than CBCC3-DG2 on f 12 . It is due to that DG2_GDD divides variables into several overlapping groups, which is useful for the CBCC3 framework to solve problems efficiently, but DG2 considers all variables of f 12 as a single group. It can be concluded that GDD is helpful to improve not only the grouping accuracy but also the optimization efficiency of DG2. The comparative results of GDG and GDG_GDD are similar to DG2 and DG2_GDD.
From the observation of Table S-IV in the supplementary material, it can be concluded that CBCC3-DG2_GDD performs significantly better than other algorithms on most functions. GDG_GDD, RDG3, and CCPSO2 apply the random grouping, but DG2_GDD has a higher grouping accuracy and helps CBCC3 search solutions in a relatively clear direction within the decomposed space. Compared with CSO, SLPSO, DSPLSO, and DLLSO that solve problems as a whole, CBCC3-DG2_GDD decomposes LSOPs into relatively independent parts and solves them separately, which can search solutions broadly.
In this section, it can be concluded that GDD can not only help GDG and DG2 to decompose the overlapping problems but help CCEAs to outperform other excellent algorithms.

V. CONCLUSION
In this article, the GDD method was proposed to decompose overlapping LSOPs and improve the grouping accuracy. Specifically, GDD first obtains the MVS based on IaV and then separates the overlapping groups by their MVS. In addition, GDD uses the recursive method and the adjustment strategy to divide the overlapping problems into appropriate sizes. GDD has three advantages. First, GDD improves the grouping accuracy of decomposition methods, not only on the overlapping problems but also on partially separable problems. Second, GDD has a high fault-tolerance ability. It can help decomposition methods to divide problems into approximately ideal groups even if the corresponding IaV matrix is inaccurate. Third, GDD can help CCEAs to get better optimization results than other well-known optimization algorithms, especially on the overlapping problems.
To evaluate the grouping accuracy of decomposition methods, two new metrics were proposed, including the overlapping rate (R ol ) and the redundancy rate (R rd ). The exhaustive experiments were conducted on five functions of IEEE CEC 2013 and 20 designed overlapping functions, and the results showed that CCEAs combined with GDD achieve a better performance on the LSOPs.