Transferable Adaptive Differential Evolution for Many-Task Optimization

The evolutionary multitask optimization (EMTO) algorithm is a promising approach to solve many-task optimization problems (MaTOPs), in which similarity measurement and knowledge transfer (KT) are two key issues. Many existing EMTO algorithms estimate the similarity of population distribution to select a set of similar tasks and then perform KT by simply mixing individuals among the selected tasks. However, these methods may be less effective when the global optima of the tasks greatly differ from each other. Therefore, this article proposes to consider a new kind of similarity, namely, shift invariance, between tasks. The shift invariance is defined that the two tasks are similar after linear shift transformation on both the search space and the objective space. To identify and utilize the shift invariance between tasks, a two-stage transferable adaptive differential evolution (TRADE) algorithm is proposed. In the first evolution stage, a task representation strategy is proposed to represent each task by a vector that embeds the evolution information. Then, a task grouping strategy is proposed to group the similar (i.e., shift invariant) tasks into the same group while the dissimilar tasks into different groups. In the second evolution stage, a novel successful evolution experience transfer method is proposed to adaptively utilize the suitable parameters by transferring successful parameters among similar tasks within the same group. Comprehensive experiments are carried out on two representative MaTOP benchmarks with a total of 16 instances and a real-world application. The comparative results show that the proposed TRADE is superior to some state-of-the-art EMTO algorithms and single-task optimization algorithms.


I. INTRODUCTION
R ECENTLY, an emerging research topic, called evolution- ary transfer optimization [1], [2], that combines transfer learning [3], [4], [5] and evolutionary computation (EC) [6], [7] (including evolutionary algorithms [8], [9] and swarm intelligence [10], [11], [12]) has become attractive.In the traditional EC paradigm, the optimization tasks are solved one by one without considering the relatedness and similarity among different tasks [13], [14].However, it is observed that optimization tasks seldom exist independently in practice.For example, some distinct tasks may have similarities in the function landscape or problem structure.Therefore, it is motivated that evolutionary transfer optimization can use the optimization experience or the domain knowledge in solving some tasks (i.e., source tasks) to improve the search performance on other similar tasks (i.e., target tasks).The technique that reuses information from source tasks to help solve target tasks is called knowledge transfer (KT) [15], [16].
Among different types of optimization problems, the multitask optimization problem is one of the most representative problems related to evolutionary transfer optimization.In the multitask optimization problem, multiple optimization tasks are required to be solved simultaneously with the assumption that there are similarities between the tasks to some extent.Therefore, a new paradigm called evolutionary multitask optimization (EMTO) has emerged to solve multitask optimization problems [17].Different from traditional EC algorithms, an EMTO algorithm not only contains an EC algorithm as the base solver but also contains a KT method.Some successful designs of KT methods have helped EMTO algorithms to achieve superior performance in solving multitask optimization problems [18], [19].Moreover, these EMTO algorithms have also shown effectiveness and efficiency in solving real-world application problems [20], [21].
Based on the success of the EMTO algorithm, researchers begin to consider a kind of more complex optimization problem with many tasks (i.e., more than three tasks) to be solved, which is called the many-task optimization problem (MaTOP).The MaTOP is a challenging problem because the many tasks may contain some unrelated or misleading tasks.If unrelated tasks are mistakenly selected to perform KT between tasks, the performance of EMTO algorithms may deteriorate.
Like the multitask optimization problem [22], [23], two important issues need to be considered in solving MaTOP.
Issue 1: How to measure the similarity between the source tasks and the target task.There are multiple available source tasks in MaTOP that can be used to transfer knowledge to the target task.This is more challenging than MTOP which has only one or two source tasks.However, the similarities between these many source tasks and the target task are not known in advance.Then, how to measure the similarities between the source tasks and the target task to help find out which is the most suitable source task for transferring knowledge is an important issue.
Issue 2: How to transfer knowledge from source tasks to help the search process on the target task.Given that the source task and the target task are similar according to a similarity measurement, how to perform effective KT between the two tasks is still important.If the KT between tasks is not properly designed, the KT may lead to a negative effect called negative transfer.The negative transfer indicates that the interaction between tasks deteriorates the optimization performance on the target task compared to the independent search process.
To address these two issues, some EMTO algorithms have been proposed in the literature to solve MaTOP [24], [25].Moreover, these EMTO algorithms have been successfully applied to solve real-world MaTOP, such as diversified robotic morphology designs [26]; fuzzy cognitive maps [27]; and robotic arm control [28], [29].However, there are two remaining challenges and research gaps in solving MaTOP.
First, although some similarity measurements have been proposed in [30] and [31], they mainly consider the similarity/dissimilarity between the population distributions in a common search space of the source task and the target task.That is, if the two populations are closely distributed in the unified search space, the two tasks are regarded as similar.We denote this kind of population distribution-based similarity measurement (PDSM).If the global optimal solutions of the two tasks are highly similar (e.g., intersections in the global optimum), PDSM works well since the elite individuals of one task can be easily reused to improve the fitness of other tasks.However, in real-world problems, the global optimal solutions of different tasks may differ greatly from each other.Moreover, PDSM relies on the population distribution at the current generation.However, due to the randomness of the evolution process, the similarity measurement based on population distribution estimation at the only current generation may be unstable and unreliable.Furthermore, due to the limited size of the sample (i.e., the population size), there may exist an estimation error in the population distribution.Therefore, PDSM may not always be useful.
Second, although existing EMTO frameworks mainly use multiple populations for solving MaTOP and allow multiple populations to use different EC algorithms, many researchers adopt the EC algorithm with fixed parameter settings as the base solver for all populations.Differential evolution (DE) is a well-known EC algorithm which has many improved variants [32], [33] and applications in complex problems, such as large-scale [34], [35]; multimodal [36], [37]; manyobjective [38]; and real-world problems [39], [40].Hence, we use DE as the base solver in this article.However, previous studies [41], [42] have shown that the parameters (e.g., F and Cr) of the DE algorithm are sensitive to the problem to be solved.For example, when using DE as the base solver, different problems may require different parameter settings of F and Cr to achieve the best optimization results.However, it is unknown in advance what kind of problem we are facing and, therefore, it is difficult to set up suitable parameters.Fortunately, when dealing with MaTOP, there are some similarities between the tasks.Therefore, it is motivated that we can set up different parameters for different populations to solve different tasks, and then we can observe the performance of different parameters so as to transfer the well-performing parameters as successful evolution experiences (i.e., the parameter setting of F and Cr) between the similar tasks.This way, the parameters of DE can be adaptively adjusted to distinguish better F and Cr.To the best of our knowledge, although some efforts have been proposed in the literature to transfer solution knowledge or meta-knowledge among the tasks [18], [19], no research efforts have been paid to study the transfer of successful evolution experience (e.g., EC algorithm parameters) among the tasks to solve MaTOP.
To address the above challenges and fill the research gap, we propose a two-stage transferable adaptive DE (TRADE) to solve MaTOP effectively and efficiently.The main contributions of this article are as follows.
First, different from PDSM which merely considers the similarity in population distributions, we propose to consider a new kind of similarity between tasks, namely, shift invariance.The shift invariance means that the two tasks are similar after linear shift transformation on both the search space and the objective space.The proposed similarity measurement is called shift invariance-based similarity measurement (SISM).To the best of our knowledge, no studies have been taken to study utilizing the shift invariance between tasks to effectively solve MaTOP.
Second, to identify and capture the shift invariance between tasks, we propose a novel task representation strategy (TRS) together with a task grouping strategy (TGS) which are carried out in the first evolution stage.Specifically, TRADE first uses the same EC algorithm in all the populations to correspondingly solve all tasks for a few generations to collect evolution information for representing the tasks.That is, the populations of all the tasks use the same EC algorithm but evolve independently without any KT.Afterward, the TRS uses the obtained evolution information to represent each task as a feature vector.Then, the TGS divides the tasks into multiple groups based on the task feature vectors, where the tasks within the same group are regarded to be similar (i.e., shift invariant) in the function landscapes.
Third, to effectively reuse the knowledge from similar tasks within the same group to improve the search efficiency, we propose a novel KT method, called successful evolution experience transfer (SEET), in the second evolution stage.Specifically, the populations of the tasks in the second evolution stage of TRADE use EC algorithms with different parameters and are evolved with the SEET method to transfer knowledge of successful evolution experience (i.e., successful parameters).To perform the SEET method, we first propose an evolution quality analysis strategy to distinguish which populations of the tasks evolve well or poorly within each group.Then, the successful parameter settings of the wellevolved populations identified by evolution quality analysis are regarded as knowledge of successful evolution experience and are transferred to the poorly evolved populations to produce promising offspring within each group.
The remainder of this article is organized as follows.Section II gives the introduction of the related work on MaTOP, DE, and the motivation of this article.Section III introduces the definition of the SISM and the details of the proposed TRADE algorithm.Section IV carries out experimental studies to show the effectiveness of the proposed TRADE.The conclusion is given in Section V.

II. PRELIMINARY
The notations with their descriptions used in this article are given in Table S.I in the supplemental material.

A. Many-Task Optimization Problem
1) Problem Formulation: Suppose there are NT singleobjective optimization tasks in a MaTOP and the task k (k = 1, . . ., NT) denoted as T k can be formulated as where f k (•) is the objective function of T k , X k is the search space of T k , and D k is the dimensionality of the search space.The MaTOP is the extension of the multitask optimization problem that contains more than three tasks (NT > 3) to be solved.The output of an EMTO algorithm for solving MaTOP contains NT optimized solutions denoted as {x * 1 , . . ., x * NT } for all the NT tasks.In this article, we consider the optimization tasks in the continuous search space bounded by a box constraint.The lower bound and upper bound of dimension d (d = 1, . . ., D k ) of T k are denoted as LB k,d and UB k,d .Since the search spaces of these tasks may be different, the solutions (i.e., x k of T k ) are encoded into a unified search space U ⊆ [0, 1] D U where D U = max{D 1 , . . ., D NT }.The encoded solution of T k is denoted as u k and calculated as where u k,d is the dth dimension of u k .Conversely, when a solution u k on T k is to be evaluated, it should be decoded to obtain the solution x k in the original search space X k .If D k < D U , only the first D k dimensions of u k are decoded to obtain the solution x k .
2) Multipopulation Framework for MaTOP: Although there are EMTO algorithms using a single population, most of the EMTO algorithms for MaTOP in the literature are generally implemented in a multipopulation framework [25], [30].The basic multipopulation framework for MaTOP (MP4MaT) in the literature is summarized and shown in Algorithm 1.The input of the MP4MaT framework contains a task set T containing tasks to be solved, a base solver set A containing EC algorithms (EAs) with different parameter settings, and the maximum number of fitness evaluations (MAXNFEs) for optimization of all the tasks.The number of the base solvers in A is denoted as NS.The EMTO algorithm based on MP4MaT first initializes a population for each task in the unified search space U ⊆ [0, 1] D U and assigns a base solver from the base solver set A = {EA i } NS i=1 for each task (lines 3-6).The assigned index for T k is denoted as τ k ∈ {1, . . ., NS}.Then, the evolution process of every task begins.In a generation, MP4MaT should decide whether to perform KT on each task.If T k (k = 1, . . ., NT) is going to perform KT, it first selects source tasks for KT based on a similarity measurement (lines 9 and 10).Then, the offspring of T k are produced by a KT method based on the selected source tasks (line 11).Otherwise, the offspring of T k is produced using the evolution operators of the assigned EA τ k (line 13).Afterward, the population for each task undergoes the fitness evaluation and selection process (lines 15 and 16).The stopping condition is when the number of fitness evaluations (NFEs) reaches MAXNFE.
The MP4MaT framework contains two main components which are the task similarity measurement and the KT method for transferring knowledge between tasks.The major difference between different EMTO algorithms in the literature lies in the design of the task similarity measurement and the KT method.

B. Motivation and Contribution
It should be noted that although the MP4MaT framework allows multiple populations use different base solvers, most of the existing works use a unified base solver for all the tasks in MaTOP.That is, the base solver set A = {EA i } NS i=1 (NS = 1) only contains one EA associated with its fixed parameter settings.However, many studies show that the optimization performance of EA is highly related to its parameter settings and the suitable parameter setting is highly related to the problem being solved [41], [42].Since the problem to be solved is a black-box optimization problem and there is no prior knowledge, finding the suitable parameter setting of EA for solving the problem may require heavy computational effort on parameter tuning.Then, it is motivated that when multiple similar tasks are solved together, the parameter settings of EA that work well on the source tasks can be reused to improve the search efficiency of the target task.
A motivating example of the proposed methods is shown in Fig. 1.Suppose that there are two optimization tasks (i.e., T 1 and T 2 ) to be solved.Following the procedure of MP4MaT framework, the search spaces of the two tasks are first mapped to a unified search space U ⊆ [0, 1] D U where D U = max {D 1 , D 2 } to allow KT.The optimal solutions (i.e., x = 0.3 and x = 0.7) of the two tasks are far from each other in the unified search space.It is assumed that the two tasks are shift invariant.That is, the objective function f 1 (x) of T 1 after linear transformation, denoted as f 1 (x-80), is highly similar to the objective function f 2 (x) of T 2 .For a detailed definition of shift invariance, refer to Section III-A.The base solver EA τ 1 for T 1 is DE with parameter settings of F = 0.5 and Cr = 0.9 while the base solver EA τ 2 for T 2 is DE with parameter settings of F = 0.5 and Cr = 0.1.The population distributions of the two tasks at generation g are also plotted in Fig. 1.It can be observed that the distance between population 1 and the optimal solution (x = 0.3) of T 1 is smaller than the distance between population 2 and the optimal solution (x = 0.7) of T 2 in the unified search space.Since the two tasks are regarded to be very similar (i.e., shift invariant), it is rational to think that population 1 which uses EA τ 1 with parameter settings of F = 0.5 and Cr = 0.9 is more successful and can solve these two tasks better than that uses EA τ 2 with parameter settings of F = 0.5 and Cr = 0.1.Therefore, in this article, we consider transferring these successful evolution parameters as the successful evolution experience between tasks to facilitate a more efficient search for MaTOP.To the best of our knowledge, no research attention has been paid to using different parameter settings of EA as base solvers in MP4MaT framework and making use of the shift invariance-based similarity between tasks by transferring parameter settings among different tasks.
Moreover, we highlight the contribution of our proposed KT methods by the example in Fig. 1.In the case of Fig. 1, the existing PDSM considers that the two tasks are dissimilar since the population distributions of T 1 and T 2 are different.However, these two tasks are actually similar by simple shift transformation, which can be properly identified by our proposed SISM method.

C. Differential Evolution
A DE algorithm mainly contains four processes: 1) initialization; 2) mutation; 3) crossover; and 4) selection.In the initialization process, the dth dimension of the ith individual, denoted as x i,d , (i = 1, . . ., NP) where NP is the population size is initialized as where rand denotes a randomly sampled number within [0, 1], and LB d and UB d are the lower bound and upper bound of the dth dimension.After initialization, the iteration of DE begins.In a generation, the population first undergoes the mutation process.In the mutation process, each individual generates a mutant vector.An advanced mutation operator called where − → x pbest is a randomly selected individual from the top p% individuals in the population.F is the scaling factor of the difference vector, r 1 is an index randomly selected from {1, . . ., NP}, and x r2 is a randomly selected individual from the union of the population POP = { − → x 1 , . . ., − → x NP } and an archive ARC storing historical solutions until generation g.The archive ARC is empty in the initialization process of DE.Without loss of generality, we only consider the optimization problem with simple box constraints in this article and, therefore, the generated vectors are clipped by LB d and UB d to satisfy the box constraints.
After mutant vectors are generated, the crossover operator is carried out on each mutant vector to produce a trial vector.We introduce the binomial crossover here.The dth dimension of the ith trial vector at generation g, denoted as u i,d , is set as where Cr is a crossover parameter for the ith individual and d r is a randomly selected dimension before the crossover process on the ith mutant vector.
After trial vectors are generated, the trial vector is evaluated and undergoes the selection process to update the population POP and the archive ARC for a minimization problem as If the size of ARC exceeds NP, then (|ARC|-NP) individuals will be randomly selected and removed from ARC.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Similar to [43], we generate F and Cr for the ith individual from a Gaussian distribution with parameters mF and mCr as F = Gaussian(mF, 0.1) ( 8) In this article, we use different DEs with different settings of mF and mCr as the base solvers for different tasks.In this way, we can set mF and mCr with different values to study their effects for solving different tasks.

A. Shift Invariance-Based Similarity Measurement
The concept of shift invariance originates from the pattern recognition task in the computer vision area [44].Let an image with spatial resolution H × W and C channels be represented by X ∈ R H×W×C .According to [44], the shift invariance is represented as where F(X) is the output function for the input X, h and w are the shift transformation of the pixel in vertical and horizontal directions in the image, and the shift operation is defined as where % is the modulus operator.This shift transformation is called circular shift when the pixels after the shift hit the edge of the image, they are rolled to the other edge.Similarly, we can define the shift invariance-based similarity between optimization tasks in MaTOP in the following.Definition 1: Two optimization tasks (i.e., T 1 and T 2 ) with the same dimensionality D of the search space are said to be shift invariant if there exists a linear transformation with parameters x ∈ R D and y ∈ R, a connected region X D ⊆ R D , and a small positive value ε such that where V > 0 is the hypervolume of X D and the shift operation on a fitness function is defined as where x is a D-dimensional vector representing the shift in the search space and y is a scalar representing the shift in the objective space.The major difference between SISM and PDSM is that SISM aims to capture the landscape similarity between two tasks while PDSM aims to capture the global optimum similarity.Therefore, the SISM is more general to identify the similarity of the tasks, as the example illustrated in Fig. 1.

B. Framework of TRADE
The framework of TRADE for solving MaTOP is plotted in Fig. 2. The framework contains two evolution stages.
In the first evolution stage, the initialization process is carried out for each task.Specifically, a population of size NP is created for each task with uniform random sampling in the original search space according to (3).Moreover, a unified EA (EA u ) associated with its predefined parameter setting is assigned for all the tasks.After initialization, all the populations of the tasks are evolved independently by the same evolution operators of EA u for a small number of generations (G 1 ) in the first evolution stage.Note that in this stage, the KT process is not performed.Through the evolution process in the first evolution stage, the information of the tasks is collected.After the stopping condition of the evolution process of the first evolution stage is satisfied, the TRS is carried out to represent each task as a vector according to the evolution information extracted from the search history.Then, the TGS can divide the tasks into NG groups based on the representations extracted by TRS.After TGS, the tasks that are divided into the same group are considered shift invariant.The goal of the first evolution stage is to identify and capture the shift invariance between similar tasks based on SISM for the MaTOP.The details of the first evolution stage including TRS and TGS of TRADE will be introduced in Section III-C.
In the second evolution stage, a solver assignment process is carried out within each group.Specifically, each task (i.e., population) is assigned with a base solver associated with its parameter settings that is randomly selected from the base solver set A = {EA i } NS i=1 .Then, each population optimizes its corresponding task by using this solver.To perform SEET, the evolution quality analysis is used to identify the populations that evolve better due to using suitable parameter settings.These suitable parameter settings are regarded as knowledge of successful evolution experience.Then, the successful parameters settings (i.e., the knowledge) are transferred from well-evolved populations to poorly evolved populations to produce promising offspring.With the SEET method, the populations of the tasks can propagate the suitable parameter settings of EA for solving similar tasks within a group.Note that an important reason that the SEET method can work well is that the tasks within the group are regarded to be similar after TGS in the first evolution stage.The details of the second evolution stage including the SEET method are introduced in Section III-D.

C. First Evolution Stage With TRS and TGS 1) TRS:
We propose a simple yet efficient TRS for the tasks without introducing much computational cost to identify Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.and capture the shift invariance between the tasks.Suppose the evolution process for the first evolution stage lasts for G 1 generations.Then, T k is represented as a G 1 -dimensional vector θ k ∈ R G 1 .The best-so-far fitness of each task in every generation is collected as the evolution information.Let Y * k = {y * k,1 , . . ., y * k,G 1 } be a set containing the best-so-far fitness in every generation for T k in the first evolution stage.The lower bound of Y * k is calculated as where mean(•) and std(•) denote functions for calculating mean and standard deviation for a set of scalars, respectively.Then, the dimension g (g = 1, . . ., G 1 ) of vector θ k can be calculated as where log(•) is the logarithm function and η = 1E − 25 is a small positive value to avoid an invalid logarithmic operation.
To illustrate how the TRS can identify and capture the shift invariance between tasks in MaTOP, an example is given in Fig. 3.In Fig. 3(a) and (b), the curves of logarithm of fitness versus generations on three tasks are plotted on the left side while the representations of three tasks by TRS are plotted on the right side.Note that all tasks are evolved by the same base solver EA u .In Fig. 3(a), since the three tasks are with heterogeneous functions, task representations after TRS (i.e., the curves in the right figure) are rather different.In Fig. 3(b), the three tasks are shift invariant.That is, there exists a shift transformation on the search space and the objective space such that the transformed functions are similar to each other.Therefore, in this case, the task representations after TRS are highly similar which are shown in the right of Fig. 3(b).This experimental example shows that the TRS can well identify and capture the shift invariance between tasks in MaTOP.
2) TGS: Given the task representation, the TGS is carried out to divide the tasks so that similar tasks belong to the same group while the dissimilar tasks belong to different groups based on SISM.Herein, we use the simple Euclidean distance between θ k of different tasks to measure the shift invariance-based similarity.In this article, the classical K-means clustering algorithm [45] is used to group the tasks after the first evolution stage.Note that the parameter K representing the number of clusters in K-means algorithm is a sensitive parameter that affects the quality of the grouping process.Specifically, if the number of groups NG (i.e., parameter K in K-means algorithm) is not set properly, some dissimilar tasks may be grouped into the same group, and transferring knowledge between them may lead to negative transfer.Therefore, we adopt the Chinese restaurant process to determine NG automatically [46].
The complete procedure of the first evolution stage in TRADE is shown in Algorithm 2. The CRP(•) in line 7 is an implemented function of the Chinese restaurant process with input parameters α and ρ.The default setting of these parameters will be introduced in Section IV-A.The kmeans(•) in line 8 is the K-means clustering algorithm [45].The output of the first evolution stage includes the group indices for all the tasks (gID) and the NFEs.The unified base solver EA u in this article is the DE algorithm with parameter settings of mF = 0.5 and mCr = 0.5.

3) Time Complexity Analysis:
The computation in the first evolution stage mainly comes from TRS and TGS.For TRS, since each representation vector of the task is G 1 -dimensional and there are NT tasks, it has a complexity of O(NT× G 1 ).The time complexity of CRP(•) is O(NT 2 × G 1 ) and the time complexity of kmeans(•) algorithm performed on NT tasks vectors is O(T × NG × NT × G 1 ) where T denotes the number of iterations and be considered as a constant.NG is the returned result of CRP(•) which is normally much smaller than NT and can be considered a constant.That is, the major time complexity in the first evolution stage is O(NT 2 × G 1 ).
Take a representative PDSM method that uses Kullback-Leibler divergence as an example [25], the time complexity is O(MAXGEN× NT 2 ×D U ).Note that we have G 1 <MAXGEN.That is, the complexity of TRS and TGS is not dependent on D U compared to existing PDSM methods that calculate the similarity between populations of every pair of tasks in every generation.Moreover, G 1 can be specified manually to balance accuracy and computational cost.The parameter investigation Perfom EQA and calculate EQ k according to Eq. ( 16); 7: Sort the tasks in each group based on EQ k and obtain their rank; 8: Calculate p SEET according to Eq. ( 17); 9: For k = 1 to NT Do 10: For i = 1 to NP Do 11: If rand 1 < p SEET and rand 2 > 1/ rank(T k ) and gSize(gID k ) > 1// SEET 12: Randomly select a source task T s from the tasks whose ranks are smaller than gSize(gID k )/NS; 13: Select an EA index τ s according to Eq. ( 18); 14: Set mF and mCr as the parameters of EA τ s ; 16: Sample F and Cr with mF and mCr by Eq. ( 8) and Eq. ( 9); 17: Else // Use base solver EA τ k 18: Set mF and mCr as the parameters of EA τ k ; 19: Sample F and Cr with mF and mCr by Eq. ( 8) and Eq. ( 9); 21: End If 22: Undergo mutation and crossover according to Eq. ( 4) and Eq. ( 5) to obtain an offspring individual; 23: Evaluate individual and update NFE, x * k , and y * k,g ; 24: Update population POP k by selection and archive ARC k according to Eq. ( 6) and Eq. ( 7); 25: If parent is replaced by offspring produced by EA τ s 26: Else If parent is replaced by offspring produced by EA τ k 28: End If 30: End For 31: End For 33: g = g + 1; 34: End While 35: End of G 1 is carried out in Section IV-D.Therefore, the proposed TRS and TGS are more scalable.

D. Second Evolution Stage With SEET
The complete procedure of the second evolution stage in TRADE is shown in Algorithm 3. The input mainly includes the evolved populations {POP k } NT k=1 , evolved archives {ARC k } NT k=1 , the best-so-far fitness of all the tasks, and the group indices (gID) of all the tasks after the first evolution stage.Note that all base solvers in the base solver set A = {EA i } NS i=1 are implemented as DEs with different parameter settings of mF and mCr.The main components in the second evolution stage include solver assignment, the evolution quality analysis, and the offspring production process.
For the solver assignment process, we simply assign each task an EA with its mF and mCr that is randomly selected from the base solver set A = {EA i } NS i=1 at the beginning of the second evolution stage (line 2).
For the evolution quality analysis process, we evaluate the evolution quality of each population at the beginning of a generation (line 6).The evolution quality reflects how well an EA with its parameter settings performs on a task.Suppose that the current generation is g > G 1 after the first evolution stage and let Y * k = {y * k,1 , . . ., y * k,G 1 , . . ., y * k,g } be a set containing the best-so-far fitness for T k from generation 1 to g.Then, evolution quality EQ k of T k in solving a minimization problem is calculated as where η = 1E − 25 is a very small value to avoid zero.Since different tasks may use different base solvers in the second evolution stage (i.e., g > G 1 ), a larger EQ k indicates that EA τ k with its successful mF and mCr achieves larger improvement on the global best fitness of T k .That is, if EQ 1 > EQ 2 , it is considered that EA τ 1 optimizes its task (i.e., T 1 ) more successfully than EA τ 2 (i.e., solving T 2 ).After calculating EQ k for all the tasks, we sort the tasks in each group according to their EQ k in descending order (line 7).Then, the rank denoted as rank(T k ) of each task T k based on EQ k is obtained.A smaller rank(T k ) indicates T k is with a larger EQ k and EA τ k is more successful.
For the offspring production process, the offspring of a task can be produced by either the base solver EA or the proposed SEET method.The occurrence of the SEET process is determined based on two parameters probabilistically (line 11).If an offspring individual is determined to be produced by the base solver EA, it follows the procedure described in Section II-C (lines 18-20).Otherwise, the offspring individual is generated by the SEET method (lines 12-16).Specifically, to decide whether SEET is used to produce an offspring individual, two random numbers (i.e., rand 1 and rand 2 ) within [0,1] are independently generated.Then, they are compared with two parameters: the probability of SEET (p SEET ) and 1/rank(T k ), respectively.p SEET that controls the occurrence of SEET is calculated as (line 8) where g is the current generation and MAXGEN is the maximum number of generations.It can be seen that p SEET gradually increases to 1 as the evolution proceeds.This is because we want to keep the evolution process by different EAs independently to distinguish the successfully evolved tasks in the early stage.In the later stage, these successful parameters are encouraged to transfer to poorly evolved tasks to improve the search performance.Moreover, the task T k with a larger rank(T k ) is considered to be a poorly evolved task and will have a larger probability to learn from the successfully evolved tasks.Note that if the group size of T k denoted as gSize(gID k ) is 1, the only task in the group will evolve independently.
If an offspring individual is determined to be produced by SEET, a source task denoted as T s is first selected randomly based on the evolution quality analysis (line 12).Afterward, an EA index denoted as τ s ∈ {1, . . ., NS} associated with its parameter setting is selected based on the evolution experience of the selected source task T s (line 13).Finally, the offspring individual is produced by crossover and mutation operators of the selected EA (denoted as EA τ s ) (lines 15 and 16).The main components of SEET are evolution experience representation, source task selection, and the solver (i.e., EA) selection.
For the evolution experience representation, we define two variables: count and countSuc, for counting the used times and successful times of different EA for each task.At the beginning of the second evolution stage, count and countSuc are initialized as NT × NS zero-matrix (lines 3 and 4).Every time a new individual is produced, the element of the task and the selected EA in count is increased by 1 (lines 14 and 19).If the produced offspring individual by a selected EA survives through selection, the corresponding countSuc will be updated (lines [25][26][27][28].In this way, the success rates of different EAs on a task can be calculated. For the source task selection, a source task T s is randomly selected from the tasks whose ranks are smaller than gSize(gID k )/NS in the group of T k (line 12).This is to encourage T k to learn from the successful tasks in the group.Note that if gSize(gID k )/NS is smaller than 1, we simply randomly select a task for T k from the group to learn.
For the solver selection, a solver index τ s is selected based on the evolution experience of the source task T s as (line 13) That is, we select the parameter setting of EA from the evolution experience of T s with the highest success rates.Afterward, F and Cr are generated by ( 8) and ( 9) using the parameter settings of mF and mCr of EA τ s (lines 15 and 16).The rest of the offspring producing process is executed in the same way as ( 4) and ( 5).After the entire offspring population is produced, the evolution processes including fitness evaluation, selection, and archive update are executed (lines 23 and 24).The entire process is repeated until the stopping condition (e.g., when NFE >= MAXNFE) is met.

A. Experimental Setup 1) Benchmark Problem:
In the experiments, we use two representative MaTOP benchmarks, namely: 1) CEC19MaTOP [47] and 2) GECCO20MaTOP [48] to test the effectiveness of the proposed algorithm.Both CEC19MaTOP and GECCO20MaTOP use some basic functions with rather different global optima to act as the component tasks.These basic functions are Sphere, Rosenbrock, Rastrigin, Ackley, Griewank, Weierstrass, and Schwefel, which have heterogeneous function landscapes.For example, Sphere function is a smooth and single-modal function while Rastrigin is a multimodal function that contains many local optima.The CEC19MaTOP benchmark contains six 50-task problems and the GECCO20MaTOP benchmark contains ten 50-task problems.The numbers of different basic functions used in these benchmarks are listed in Table S.II in the supplemental material.Specifically, all the problems in the CEC19MaTOP benchmark only contain one type of basic function while some problems in the GECCO20MaTOP benchmark contain different types of basic functions.The only difference between the tasks that use the same basic function is that their function landscapes are shifted by different biases.Hence, the tasks that use the same basic function can be considered shift invariant according to Definition 1.Therefore, the problems (i.e., problems 4-10 in GECCO20MaTOP benchmark) that contain multiple types of basic functions are considered more difficult and challenging.
2) Parameter Settings: The parameter settings of TRADE are listed in Table S.III in the supplemental material.In this article, we use a DE variant with a current-to-pbest mutation strategy which has shown good ability in balancing exploration and exploitation.Then, DE with different parameter settings of mF and mCr in ( 8) and ( 9) is used as the base solvers.Specifically, mF of all the base solvers is fixed as 0.5, and mCr can have three available settings from {0.1, 0.5, 0.9}.Therefore, the base solver set A contains three base solvers, denoted as DE/0.1,DE/0.5, and DE/0.9, with DE/0.5 as the unified base solver EA u .The population size of each task (NP) is set as 100.The MAXGEN is set as 1000.Since there are NT = 50 tasks in each problem, the maximum NFEs (MAXNFEs) is set as NT × NP × MAXGEN = 50 00 000.The parameter of maximum generations for the first evolution stage (G 1 ) is set as 100.The parameters α and ρ of CRP in TSG are 0.05 and 10, respectively.The algorithms are implemented in MATLAB and the experiments are conducted on a computer cluster with processors Intel Xeon E5-2699 v3.
3) Compared Algorithms: To test the effectiveness of the TRADE algorithm, we carry out comparisons between TRADE, EMTO algorithms, and single-task optimization algorithms.All the compared EMTO algorithms are implemented based on the MP4MaT framework.To enable a fair comparison, they all use the same base solver set (i.e., A = {DE/0.1,DE/0.5, DE/0.9}) as TRADE and the solver assignment strategy in the compared algorithms is randomly selecting a base solver from A for a given task at the beginning.The compared algorithms for solving MaTOP are EBS using DE as base solver (denoted as EBSDE) [24], MaTDE [25], MTEA-AD [28] using DE as base solver (denoted as MTDE-AD), AEMTO [29], and EMaTO [30].The reasons for choosing these algorithms for comparison are as follows.First, all the compared EMTO algorithms are flexible in using different EAs as base solvers.Hence, despite that some EMTO algorithms such as EMaTO use other EAs such as GA as base solvers in the original paper, the DE can still be seamlessly used in these EMTO algorithms.Second, most of these algorithms adopt PDSM for measuring the similarity between tasks while TRADE adopts SISM.Moreover, the compared algorithms perform KT by transferring solutions while TRADE transfer EA parameters.Hence, the comparison can validate the effectiveness of the proposed SISM and KT methods.
Different from the EMTO algorithms, the single-task optimization algorithms solve the tasks independently without Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
KT.The compared single-task algorithms are STDE/0.1 [42], STDE/0.5 [42], STDE/0.9[42], STDE/r, and STcDE [49].Herein, the ST stands for "single-task" and is added as the prefix of the algorithm name.Specifically, the STDE/0.1,STDE/0.5, and STDE/0.9use the parameter setting of mCr = 0.1, mCr = 0.5, and mCr = 0.9, respectively.To enable a fair comparison between TRADE and these STDEs with different parameter settings, these STDEs are implemented in a two-stage manner as TRADE.That is, in the first evolution stage, STDE uses the same unified base solver as TRADE to evolve and in the second evolution stage, STDE uses the base solver with its predefined parameter settings (i.e., mCr) to evolve.STDE/r is an STDE variant with "r" meaning "random."Specifically, in the second evolution stage of STDE/r, an EA associated with its parameter setting of mCr is randomly selected from A to evolve.Through the comparison between TRADE and the above STDEs, we can show that our TRADE can gradually adapt to the most suitable parameter setting for solving a set of similar tasks by SEET.Furthermore, STcDE is a single-task optimization algorithm using an adaptive DE with different parameters.Besides, there are other works [23] that use adaptive DE with different parameters to solve multitask optimization problems.The major difference between TRADE and these adaptive DEs is that TRADE uses cross-task knowledge to adjust parameters (e.g., mCr) while these adaptive DEs use intratask knowledge to adjust parameters.Therefore, comparing TRADE with STcDE can show the effectiveness of SEET in parameter adaptation.

4) Performance Metric:
To reduce the statistical error caused by the randomness of the optimization algorithms, all the algorithms are run independently 30 times.The obtained best fitness in each run is collected and used for comparison.Afterward, we run a Wilcoxon rank sum test at a significance level of 0.05 between our TRADE algorithm and the compared algorithms on each task of the MaTOP.The symbols "W/T/L" indicate that our TRADE performs significantly better (Win), equal to (Tie), or significantly worse (Lose) than the compared algorithm on a task, respectively.Then, the number of tasks that TRADE is significantly better, equal to, or significantly worse than the compared algorithm is counted.TRADE is said to be better than the compared algorithm on a MaTOP if the number of "W" is larger than the number of "L" and the difference between W and L is at least larger than 5 (e.g., 10% of the 50 tasks).The comparative result on a problem will be given in the parenthesis by "+/=/−" meaning TRADE is better than, equal to, or worse than the compared algorithm.To further reduce statistical comparison error, we conduct the Friedman test [50] to compare performance among the multiple algorithms and report the average ranks, denoted as AvgRank.The smaller rank indicates better performance and the best result is marked in boldface.Moreover, the multiple comparison test [51] is adopted and the calculated p-values are given for each benchmark.

1) Comparison Between TRADE and EMTO Algorithms:
The comparative results between TRADE and the EMTO algorithms are shown in Tables I and II.Moreover, the detailed results of the EMTO algorithms on 50 tasks in a representative MaTOP (i.e., GECCO20MaTOP5) are given in Table S.IV in the supplemental material.According to the numbers of +/=/− in Table I and the AvgRank in Table II, we can observe that our TRADE significantly outperforms AEMTO, EBSDE, EMaTO, MaTDE, and MTDE-AD on most of the problems in the CEC19MaTOP benchmark and the GECCO20MaTOP benchmark.Specifically, in the CEC19MaTOP benchmark that all the problems contain homogeneous tasks using the same basic function such that all the tasks can be considered shift invariant, our proposed TRADE based on SISM is more effective than the compared algorithms based on PDSM.This is because when the global optima in different tasks in a problem differ greatly, the PDSM may become less effective.On the contrary, the TRADE utilizes a new kind of similarity, namely, shift invariance, which can still be useful when the global optima of the tasks differ greatly.Moreover, in the more complex GECCO20MaTOP benchmark where the MaTOP (e.g., GECCO20MaTOP5) contains heterogeneous tasks using different basic functions, our TRADE still outperforms the compared EMTO algorithms.This indicates that TRADE can handle more difficult MaTOP with heterogeneous tasks than the compared EMTO algorithms.
2) Comparison Between TRADE and Single-Task Algorithms: The summarized comparative results between TRADE and the single-task optimization algorithms are shown in Tables III and IV.The detailed comparative results are shown in Tables S.V and S.VI in the supplemental material.Moreover, the number of obtained best results of different single-task EAs with different parameter settings (mCr) on the tested benchmark are shown in Table V.From Tables III-V, we can obtain several important observations and conclusions.First, on the CEC19MaTOP benchmark with homogeneous tasks, the performances of EAs with different parameter settings are rather different on different problems.For example, Table V shows that STDE/0.9obtains the best results on 50 tasks in problem 1 that only contains Rosenbrock function while STDE/0.5 obtains the best results on 50 tasks in problem 2 that only contains Ackley function.
Second, although STDEs with different parameter settings have their corresponding advantages on different problems, the proposed TRADE can achieve generally better results than these STDE variants in terms of average rank.This is because the proposed TRADE with SEET method can adaptively transfer successful parameters for solving a set of shift invariant tasks and reduce the effect of harmful parameters in the search process.STDE/r represents the expected performance of a DE algorithm without knowing which parameter among mCr = 0.1, mCr = 0.5, and mCr = 0.9 is best in advance.The TRADE gives encouraging results that TRADE outperforms STDE/r on most of the problems in the CEC19MaTOP benchmark.This indicates that TRADE is effective in the case that we have no prior knowledge about which parameter works better on a black-box problem.
Third, the proposed TRADE generally outperforms STDE/0.1,STDE/0.5, STDE/0.9,STDE/r on the more complex GECCO20MaTOP benchmark that contains heterogeneous tasks.Specifically, TRADE obtains more + than − and achieves the smallest average rank on two benchmarks.The results indicate that TRADE is capable of handling more difficult MaTOP with heterogeneous tasks.
Fourth, the proposed TRADE that adapts parameters by transferring successful parameters from other tasks outperforms STcDE that adapts parameters by the knowledge within the task on 12 problems in the two benchmarks.STcDE also uses the same base solver set A as TRADE does and increases the usage of the successful parameters in the search process.In summary, TRADE can be regarded as using cross-task knowledge to adapt parameters while STcDE can be regarded as using intratask knowledge to adapt parameters.These results indicate the effectiveness of the SEET method that adapts parameters by KT.

C. Component Analysis 1) Effects of TRS and TSG in the First Evolution Stage:
The first evolution stage in TRADE mainly contains the TRS and TGS.TRS serves as representing each task to measure shift invariance-based similarity between tasks while TGS serves as grouping similar tasks into the same group based on SISM.
To testify the effectiveness of TRS, we formulate a TRADE variant called TRADE-PDSM.In TRADE-PDSM, the population center after the first evolution stage is used to represent a task instead of the way TRS does.As can be seen from the name, TRADE-PDSM is based on PDSM to represent a task that tries to capture the similarity in the global optima between tasks.On the contrary, the proposed TRS can be used to capture the shift invariance between tasks.Ideally, after TRS and TGS in the first evolution stage, the tasks using the same basic functions will be grouped into the same group.
The comparative result between TRADE and TRADE-PDSM is given in Table S.VII in the supplemental material.TRADE behaves slightly worse than TRADE-PDSM on the CEC19MaTOP benchmark and outperforms TRADE-PDSM on the GECCO20MaTOP benchmark.This is because problems in CEC19MaTOP contain homogeneous tasks using the same basic functions and the grouping strategy on these homogeneous tasks does not take effect significantly.However, on MaTOPs that contain several heterogeneous tasks with different basic functions such as problems 5-8 in the GECCO20MaTOP benchmark, TRADE significantly outperforms TRADE-PDSM.This indicates that the proposed TRS based on SISM is more effective in solving more complex MaTOP compared to TRADE-PDSM based on PDSM.
To testify the effectiveness of TGS, we formulate a TRADE variant called TRADE-w/o-TGS.In TRADE-w/o-TGS, the TGS is removed.That is, after the first evolution stage, all the tasks are grouped into a single group.The comparative result between TRADE and TRADE-w/o-TGS is given in Table S  that TRADE achieves similar performance on CEC19MaTOP benchmark and outperforms TRADE-w/o-TGS on 4 problems on GECCO20MaTOP benchmark.The advantage of TRADE is significant on MaTOPs (e.g., problem 5) that contain heterogeneous tasks.This indicates that TGS is effective by grouping similar tasks and transferring knowledge between these tasks to improve search efficiency.
To directly show the effect of TRS and TGS, we plot the grouping results after an independent run of TRADE on the GECCO20MaTOP benchmark in Fig. 4. The data after the first evolution stage is collected and the TRS is employed to represent each task.Based on these task representations, we adopt the t-distributed stochastic neighbor embedding (t-SNE) [52] to represent these tasks in 2-D space.In Fig. 4, the left figure in each subfigure is the ground truth for the grouping of the tasks.For example, problem 4 in GECCO20MaTOP benchmark uses three types of basic functions: Sphere, Rosenbrock, and Ackley.Hence, there are three groups of tasks and the tasks in the group that use the same basic function are considered shift invariant based on SISM.However, this information is not known in advance, and TRS and TGS are proposed with the aim of capturing the shift invariance.It can be seen that the right figure of each subfigure in Fig. 4 that TRS can represent tasks well.Specifically, the distances between similar tasks (i.e., from the same group) are small and the distances between dissimilar tasks (i.e., from different groups) are large after TRS.Then, based on representations obtained by TRS, the tasks can be grouped into different groups by TGS.For example, in the right figure of Fig. 4(b), the grouping result after TGS is very close to the ground truth of the groups.These results further indicate the effectiveness of the proposed TRS and TGS to measure shift invariance between tasks.
2) Effects of SEET in the Second Evolution Stage: To testify the effectiveness of the SEET method, we formulate three TRADE variants: TRADE/0, TRADE/0.5, and TRADE/1.The number after "TRADE/" represents the fixed parameter setting of p SEET .For example, in the second evolution stage of TRADE/0, p SEET is fixed to 0. Hence TRADE/0 represents the variants in that no KT method takes effect and each task evolves independently in the second evolution stage.The comparative results of TRADE and the compared variants are shown in Table S.VII in the supplementary material.TRADE significantly outperforms TRADE/0 on both two benchmarks.This indicates that the proposed SEET method is effective.Moreover, TRADE generally outperforms TRADE/0.5 and TRADE/1 on the GECCO20MaTOP benchmark.Since TRADE/0.5 and TRADE/1 use the fixed parameter settings of p SEET , the results indicate the effectiveness of the design of adapting p SEET when solving complex MaTOPs.Moreover, we formulate a TRADE variant called TRADEself to testify the effectiveness of the parameter adaptation by SEET.TRADE-self is formulated based on the self-adaptive DE in [49] and only uses intratask information to select EA associated with its parameter settings adaptively to produce offspring.It should be noted that the major difference between TRADE and TRADE-self is that TRADE uses knowledge from other tasks to adapt parameters while TRADE-self uses knowledge from its task to adapt parameters independently.The comparative result between TRADE and TRADE-self is given in Table S.VII in the supplementary material.TRADE generally outperforms TRADE-self which further indicates that the parameter adaptation brought by the SEET method is effective.
To directly show the parameter adaptation behavior of TRADE, the usage frequency of different EA parameters with different initially assigned EA at the beginning of the second evolution stage in TRADE on CEC19MaTOP1 is plotted in Fig. 5.Note that all tasks in the problems of the CEC19MaTOP benchmark use the same basic function.Combining with Table V, we observe that in the left figure of Fig. 5, the initially assigned EA on a task of problem 1 in CEC19MaTOP is with a parameter setting of mCr = 0.1, which is not an ideal parameter for solving the task.However, by the SEET method, the successful parameter mCr = 0.9 which is a good parameter according to Table V is transferred to this task to help improve search efficiency.As a result, the usage of mCr = 0.9 gradually increases while the usage of mCr = 0.1 gradually decreases at a generation as the search process proceeds.In the right figure of Fig. 5, we observe that in the case that the initially assigned EA on a task is the most suitable parameter (mCr = 0.9), the usage of this parameter increases throughout the search process.This observation further indicates the effectiveness of the proposed SEET method.

D. Parameter Sensitivity Analysis
In this section, we investigate the parameter sensitivity of G 1 which controls the maximum generations used in the first evolution stage in TRADE.We run experiments on two benchmarks with different settings of G 1 ∈{20, 40, 60, 80, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.S.VIII in the supplemental material.The results indicate that a larger G 1 can lead to better search performance.This is because when G 1 is set to be a larger value, more evolution information is collected and used to represent the tasks by TRS.Then, TGS can group the tasks more accurately such that similar (i.e., shift invariant) tasks belong to the same group while dissimilar tasks belong to different groups.As a result, TRADE can improve search performance by sharing successful parameters among shift invariant tasks by the SEET method.However, setting a larger G 1 can lead to a higher computational cost since the computational time complexity of TGS is related to the dimensionality of the task representation.To strike a balance between accuracy and computational cost, we choose to set G 1 to a moderate value of 100.

E. Real-World Application Study
To study the effectiveness of the TRADE algorithm on real-world applications, we consider the robotic arm control problem [29].In a robotic arm control task, the goal is to find the angle of each joint in the robotic arm such that the distance between the end position of the arm and the target position is minimized.A candidate solution to the task is represented as a D-dimensional vector − → α = {α 1 , . . ., α D } containing the angle of each joint.The end position of a solution is denoted as P− → α and the target position is denoted as P tar .Hence, the objective is formulated as f − → α , P tar = P− → α − P tar . ( Fig. 6(a) gives an example of a robotic arm control task with three joints.In our setting, the beginning position denoted as P 0 and the length of each joint of the robotic arm are fixed.Hence, we can generate different robotic arm control tasks by setting different target positions to construct a robotic arm control MaTOP.
An important property of this problem is that there exists shift invariance between tasks.Specifically, when the distances from two target positions of the two tasks to the beginning position P 0 is the same, the two tasks are shift invariance.We give simple proof here.Suppose there are two tasks T 1 and T 2 with target positions P tar,1 and P tar,2 , respectively, such that P 0 − P tar,1 = P 0 − P tar,2 .This means that P tar,2 can be obtained by rotating the P tar,1 around the center P 0 with a certain angle denoted as α.Then, for each solution denoted as − → α 1 = {α 1,1 , α 1,2 , . . ., α 1,D } of T 1 , there exists a shift transformation with vector − → α = { α, 0, . . ., 0} such that  Based on the above discussions, we formulate five robotic arm control MaTOPs whose target positions of the tasks are shown in Fig. 7(a)-(e).The beginning position P 0 (red circle) of the arm is fixed at (0, 0).The target positions are distributed on the circle with same/different radii.We study these MaTOPs to investigate two important questions: Q1) can the proposed TRADE algorithm distinguish which tasks are similar (i.e., shift invariant) in these real-world MaTOPs? and Q2) can the proposed TRADE algorithm make use of the shift invariance between tasks to improve the overall search performance to efficiently solve these real-world MaTOPs?
In our experimental setting, each MaTOP contains 50 tasks with D = 10.We compare the TRADE algorithm with AEMTO, EBSDE, EMaTO, MaTDE, and MTDE-AD.They all adopt the population size of 50 for each task.We set MAXGEN = 200.G 1 is set as 20 for the TRADE algorithm.The grouped results of the tasks after the first evolution stage of the TRADE is plotted in Fig. 7(f)-(j) on MaTOP1-MaTOP5, respectively.The TRADE can approximately group similar tasks whose target positions having the same [i.e., Fig. 7(f)-(g)] or similar distance [i.e., Fig. 7(h)-(j)] to the beginning position into the same group.These results indicate that the proposed algorithm can distinguish which tasks are similar as an answer to Q1.The comparative results after 20 independent runs are shown in Table VI.The TRADE algorithm generally outperforms the compared algorithms in terms of the final average fitness.These results indicate that the proposed algorithm can make use of the shift invariance to achieve better search performance as an answer to Q2.

V. CONCLUSION
In this article, we considered a new kind of task similarity (i.e., shift invariance) and proposed a two-stage TRADE algorithm that can solve MaTOP efficiently.The TRS and TGS in the first evolution stage were efficient to identify shift invariance between tasks and to group up similar tasks.The SEET method in the second evolution stage was efficient in transferring successful parameters among the tasks within the same groups.In this way, the similarity between the tasks within the same group was made best use of to improve the search efficiency.The proposed TRADE algorithm has shown promising performance in solving MaTOP.However, there are still some issues that can be further studied in the future.For example, the TRADE algorithm consumes a certain amount of extra fitness evaluations to identify similar tasks and the TRS may not be accurate enough to distinguish similar tasks when a MaTOP contains many types of heterogeneous tasks.Therefore, for future work, researchers can consider the following two aspects: 1) discovering a more efficient and accurate TRS for identifying and capturing shift invariance between tasks and 2) extending the scope of similarity between tasks such as rotated invariance between function landscapes of the tasks and biobjective similarity (i.e., shape and domain) of the tasks [53].

Fig. 3 .
Fig. 3. Examples of TRS.(a) Three tasks are with heterogeneous function landscapes.(b) Three tasks are with shift-invariant function landscapes.

Algorithm 3 1 :
Second Evolution Stage Input: T = {T k } NT k=1 : Task set; A = {EA i } NS i=1 : Base solver set; {POP k } NT k=1 : Evolved populations after stage one.{ARC k } NT k=1 : Evolved archives after stage one.Y * k = y * k,1 , . . ., y * k,G The set containing best-so-far fitness at every generation after stage one.gID = {gID k } NT k=1 : Group indices of all the tasks; NFE: Number of fitness evaluations after stage one; MAXNFE: Maximum number of fitness evaluations; g: Current generation after stage one; Output: X * = x * k NT k=1 : The best solutions for each task; 1: Begin 2: Assign each T k an EA τ k along with its mF and mCr that is randomly selected from A = {EA i } NS i=1 ; 3: count = zeros(NT, NS); // counter of the used times of different EA for each task 4: countSuc = zeros(NT, NS); // counter of the successful times of different EA for each task 5: While NFE < MAXNFE 6:

Fig. 4 .
Fig.4.Grouping results on the problems in GECCO20MaTOP benchmark based on TRS after the first evolution stage.Ground truth of the groups (left) and the formulated groups by TGS (right) after performing t-SNE dimensionality reduction method on the representations of tasks obtained by TRS after the first evolution stage on (a) GECCO20MaTOP1, (b) GECCO20MaTOP4, and (c) GECCO20MaTOP8.

Fig. 5 .
Fig.5.Usage frequency of different EA parameters during the evolutionary process with different initially assigned EA of mCr = 0.1, mCr = 0.5, and mCr = 0.9, respectively, at the beginning of the second evolution stage in TRADE for solving CEC19MaTOP1.

Fig. 6 .
Fig. 6.(a) Example of robotic arm with three equal links.(b) Illustration of shift invariance in two robotic arm control tasks.120, 140, 160, 180} in TRADE.Note that the default setting of TRADE is G 1 = 100.The TRADE variants are TRADE-20, TRADE-40, . . ., TRADE-180, respectively.The comparative results between TRADE and the compared variants are shown in TableS.VIII in the supplemental material.The results indicate that a larger G 1 can lead to better search performance.This is because when G 1 is set to be a larger value, more evolution information is collected and used to represent the tasks by TRS.Then, TGS can group the tasks more accurately such that similar (i.e., shift invariant) tasks belong to the same group while dissimilar tasks belong to different groups.As a result, TRADE can improve search performance by sharing successful parameters among shift invariant tasks by the SEET method.However, setting a larger G 1 can lead to a higher computational cost since the computational time complexity of TGS is related to the dimensionality of the task representation.To strike a balance between accuracy and computational cost, we choose to set G 1 to a moderate value of 100.

f ( − → α 1 ,
P tar,1 ) = f ( − → α 1 + − → α , P tar,2 ).According to Definition 1, the two tasks are shift invariant.An illustrating example of the shift invariance is given in Fig.6(b) with two robotic arms with two joints.The two end positions of the two arms have the same distances to their corresponding target positions.Moreover, the solution of one arm only needs to change the first dimension α 1 with the angle α to obtain the solution of the other arm.

Fig. 7 .
Fig. 7. (a)-(e) Target positions of multiple tasks in the robotic arm control MaTOP1-MaTOP5.(f)-(j) Corresponding grouped results on the tasks of the robotic arm control MaTOP1-MaTOP5 by TRADE.

Algorithm 1 MP4MaT Framework
.VII in the supplementary material.The results in TableS.VII in the supplementary material show Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE VI COMPARATIVE
RESULTS BETWEEN TRADE AND OTHER EMTO ALGORITHMS ON ROBOTIC ARM CONTROL MATOP