Orthogonal Transfer for Multitask Optimization

Knowledge transfer (KT) plays a key role in multitask optimization. However, most of the existing KT methods still face two challenges. First, the tasks may commonly have different dimensionalities (DDs), making the KT between heterogeneous search spaces very difficult. Second, the tasks may have different degrees of similarity in different dimensions, making that treating all dimensions with equal importance may be harmful to the KT process. To address these two challenges, this article proposes a novel orthogonal transfer (OT) method that is enabled by a cross-task mapping (CTM) strategy, which can achieve high-quality KT among heterogeneous tasks. For the first challenge, the CTM strategy maps the global best individual of one task from its original search space to the search space of the target task via an optimization process, which can handle the difference in task dimensionality. For the second challenge, the OT method is performed on the CTM-obtained individual and a random individual of the target task to find the best combination of different dimensions in these two individuals rather than treating all the dimensions equally, so as to achieve high-quality KT. To verify the effectiveness of the proposed OT method and the resulted OT-based multitask optimization (OTMTO) algorithm, this article not only uses the existing multitask optimization benchmark but also proposes a new benchmark test suite named multitask optimization problems (MTOPs) with DDs. Comprehensive experimental results on the existing and the proposed benchmarks show that the proposed OT method and the OTMTO algorithm are very advantageous in providing high-quality KT and in handling the heterogeneity of search space in MTOPs compared to the existing competitive evolutionary multitask optimization (EMTO) algorithms.


I. INTRODUCTION
E VOLUTIONARY computation (EC), inspired by natural selection and genetics, is a kind of population-based approach for solving optimization problems [1], [2]. The population contains multiple individuals, and each individual represents a candidate solution to the problem. After random initialization, the population undergoes the reproduction process to produce the offspring of the next generation. Then, the fitness is evaluated to measure the solution quality of the individual to help maintain the elite individuals. This process is called fitness evaluation (FE). The evolutionary process executes repeatedly until a stopping criterion is met. EC algorithms have been successfully applied to complex optimization problems, such as large-scale [3]- [5], dynamic [6], [7], multimodal [8], [9], multi-/many-objective [10]- [12], expensive [13]- [15], and real-world applications [16]- [20].
Currently, an emerging trend to use EC algorithms to solve multiple optimization tasks simultaneously, namely, evolutionary multitask optimization (EMTO), has developed rapidly in the EC community [21]. The corresponding problem related to EMTO is called multitask optimization problem (MTOP). The basic idea of EMTO is that we seldom handle optimization problems independently from scratch. In contrast, we can make use of the problem similarity between the tasks to facilitate a more efficient search. Herein, a task corresponds to an optimization problem. In the EMTO search paradigm, we handle multiple tasks simultaneously and assume that there is some similarity among these tasks. Therefore, during the search process, some information from other similar tasks can be transferred and reused to improve the solving performance of the current task. This information-sharing mechanism between similar tasks is denoted as knowledge transfer (KT). Currently, many EMTO algorithms have been proposed based on various KT methods and have shown great success in improving the search ability on similar optimization tasks [22]- [24].
Despite the success brought by EMTO algorithms, the design of effective KT still faces two main challenges, including the difference in the task dimensionality and the unequal importance of different dimensions.
First, for the challenge of the difference in the task dimensionality, the MTOP may contain optimization tasks with This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ different dimensionalities [21]. Herein, the MTOP with different dimensionalities (DDs) is denoted as MTOP-DD. However, many of the existing KT methods are based on the unified solution representation (USR) strategy [22]- [24]. The USR strategy adds some redundant dimensions to the individuals in low-dimensional tasks to make different tasks with the same dimensionality. Then, the KT is implemented in the form of crossover between these redundant representations of individuals in different tasks. Therefore, these methods are not well suited to solve the MTOP-DD since the difference in the dimensionality between the tasks can lead to negative KT. For example, when performing KT from a low-dimensional task to a high-dimensional task, some redundant dimensions will be added to the individual in the low-dimensional task due to the USR strategy and will be transferred to the highdimensional task. However, these redundant dimensions have no effect on the fitness calculation in the low-dimensional task, and their quality is unknown. Therefore, transferring these redundant dimensions to the high-dimensional task may cause low-quality or even negative KT.
Second, for the challenge of unequal importance of different dimensions, it is known that some dimensions of other tasks contain important information that can be beneficial to the evolutionary search of the current task, while some dimensions are less important or even contain misleading information. A simple example is that the two tasks have some similar dimensions in their global optima, while other dimensions differ greatly. In this case, it is not reasonable to treat all transferred dimensions with the same importance.
Besides the above two main challenges, another challenge in KT is how to make the best use of knowledge between different dimensions of different tasks because transferring information across the similar dimensions of different tasks may be useful. Some KT methods in the literature try to shuffle the dimensions of the tasks to transfer knowledge across different dimensions [25], [26]. However, if the KT based on dimension shuffle does not consider the similarity between different dimensions, it may cause negative transfer by transferring information between the dimensions that have a large gap in the search distribution.
To address the above challenges, we propose a novel KT method named orthogonal transfer (OT) to more efficiently solve the MTOP, especially for solving the MTOP-DD. Thus, an OT-based multitask optimization (OTMTO) algorithm is proposed, which has the following four main features.
First, we propose a cross-task mapping (CTM) strategy to enable the KT between heterogeneous tasks with DDs. Specifically, by modeling the mapping process as an optimization problem and solving it by a differential evolution (DE) algorithm, the global best individual of one task can be well mapped to the search space of the target task and the cross-task knowledge can be efficiently transferred to the target task.
Second, based on the CTM-obtained transferred individual, the OT method is performed on this individual and a random individual of the target task. Since these two individuals are under the same search space (i.e., the search space of the target task), the negative KT caused by the DDs can be reduced.
Moreover, the OT method adopts the orthogonal experimental design (OED) method to find the best combination of dimensions between these two individuals (i.e., the CTM-obtained transferred individual and the randomly selected individual of the target task). The OED method implicitly learns the importance of different dimensions. In this way, some useful information of other tasks is transferred to the target task while the good information of the target task can be preserved.
Third, to make the best use of different dimensions between different tasks and to avoid negative transfer, a similarity-based cross-dimension transfer (CDT) method is proposed. The CDT method is based on the similarity of the search distribution on the dimensions between the tasks to transfer the knowledge. In the CDT method, the information of some dimensions in the source task is transferred to their most similar dimensions in the target task, which can further improve the quality of KT.
Fourth, the occurrences of the OT and CDT processes are determined by two probability parameters, respectively, which are both adaptively adjusted according to the benefit brought by the OT process and the CDT process, respectively. This mechanism can adaptively control the intensity of KT brought by OT and CDT, so as to encourage positive KT.
Therefore, the OT-based OTMTO algorithm is proposed enabled by the CTM strategy and together with the CDT method. The CTM and OT can well tackle with the two main challenges of the current KT methods, i.e., the challenge of difference in the task dimensionality and the challenge of unequal importance of different dimensions, respectively, while the CDT can well tackle with the challenge of making the best use of knowledge between different dimensions of different tasks. In the experiments, unlike the existing studies that are mainly conducted on same dimensional MTOPs, we verify the advantages of OTMTO not only on the traditional MTOP benchmark but also on a newly proposed MTOP-DD benchmark test suite, by comparing it with the baseline and state-of-the-art algorithms. The experimental results show that the advantage of the OTMTO algorithm is significant both in solving traditional MTOP and the proposed MTOP-DD.
The remainder of this article is organized as follows. Section II briefly introduces the concept and development of the MTOP, the motivation of this article, and the OED method. Then, Section III presents the details of the proposed OTMTO algorithm. Section IV provides the experimental studies. Finally, Section V gives the concluding remarks of this article.

A. Evolutionary Multitask Optimization
Herein, we give the formulation of the MTOP. Without loss of generality, we assume that there are K tasks that are all single-objective minimization problems. We suppose that the ith task has an objective function f i : X i → R and the solution space is X i ⊆ R Di , where D i is the dimensionality of the ith task. Then, an MTOP that contains K tasks can be formulated as The traditional EC algorithm solves the MTOP in a singletask manner. That is, the traditional EC algorithm solves one task in a single run. However, in real-world applications, problems seldom exist in isolation, and there is some synergy among the problems of a specific class. Therefore, the key idea of EMTO is that we can reuse the knowledge learned from solving a problem to improve the performance of solving other similar problems. As a result, how to perform KT between tasks is essential to design an effective EMTO algorithm.

B. Related Work
The proposal of EMTO is inspired by transfer learning that achieves better learning performance by sharing common knowledge between similar tasks. Different from transfer learning, the EMTO paradigm emphasizes the optimization process using EC algorithms. However, similar considerations can still be drawn from transfer learning [27] to improve multitask optimization performance. That is, researchers consider the issues of "what and how to transfer" and "when to transfer." For the first issue of what and how to transfer, researchers are interested in designing effective KT methods to transfer common knowledge among tasks. These existing works can be mainly categorized as single population-based methods and multipopulation-based methods.
For the single population-based methods, researchers mainly consider evolving a single population that contains individuals of different tasks and performing KT between individuals. Multifactorial optimization is one of the most popular approaches aiming to solve MTOP with a single population [22]. The corresponding algorithm is called a multifactorial evolutionary algorithm (MFEA). The MFEA proposed assortative mating and vertical cultural transmission to enable KT among tasks and showed promising performance on continuous and discrete MTOPs. Following MFEA, many studies have been carried out to improve KT in EMTO. Bali et al. [23] proposed using linearized domain adaption to improve KT in MFEA by minimizing the distribution gap between two heterogeneous tasks. Zhou et al. [24] proposed an adaptive KT method that used multiple crossover operators to transfer knowledge and distinguished the suitable KT crossover operator toward the task based on the information collected during the evolutionary process.
For the multipopulation-based methods, researchers mainly consider evolving multiple populations for multiple tasks, respectively, and performing KT between populations based on the evolutionary information such as the search distribution. Wu and Tan [25] proposed a multitasking genetic algorithm (MTGA) to minimize the bias of the optima in different tasks and to enable KT between tasks with DDs by performing random shuffle on the dimensions. MTGA is a simple yet efficient EMTO algorithm. However, the random shuffle on the dimension does not consider the similarity between dimensions. Feng et al. [28] proposed a multipopulation-based EMTO algorithm with an explicit autoencoder, which allowed KT between tasks whose dimensionalities were different. Zhou et al. [29] proposed using a kernelized autoencoder to perform a nonlinear mapping between populations to achieve effective KT between tasks. Li et al. [30] proposed a multipopulation framework that included a new mutation operator to allow KT between multiple populations. Very recently, Li et al. [31] found that transferring task-specific knowledge like the high-quality solutions between tasks was not sufficient and proposed a novel KT method that could transfer meta-knowledge between populations of the tasks. The metaknowledge is the "knowledge (e.g., how to do to obtain the) of knowledge (e.g., high-quality solution)," which is more general to be transferred between various kinds of tasks (e.g., high, medium, or low similar).
For the second issue of when to transfer, researchers are interested in studying in what situations the knowledge should or should not be transferred. Since the similarity between tasks is usually not known in advance, performing KT between tasks may not be always useful. Therefore, an important issue is to estimate the similarity between tasks and encourage KT when the tasks are similar. To this aim, Zheng et al. [32] proposed a self-regulated EMTO algorithm. In their proposed strategy, when the tasks show some similarity in the search process, the intensity of the cross-task KT will be enhanced. Bali et al. [33] proposed MFEA2 with an online parameter estimation strategy that can estimate the similarity between tasks, so that can encourage positive KT in suitable time when the tasks were considered to be similar.

C. Motivation
In this article, we also consider both the issues of "what and how to transfer" and "when to transfer" for the design of powerful KT. Many of the existing methods perform KT on the USR of individuals from different tasks. That is, all the individuals from all tasks are encoded to unified search space, denoted as U, and the dimensionality of U is D U = max{D i }, i = 1, . . . , K. Note that all dimensions of the individuals from the ith task are normalized to the value in [0, 1] according to the lower and upper bound of the solution space of the ith task, denoted as xmin i and xmax i , respectively. When the individual from the ith task is evaluated, it should be decoded into the solution space of the ith task, which is X i ⊆ R Di . Suppose that u i is an individual from the ith task in U, then the first D i dimensions of u i are decoded as the candidate solution x i for the ith task. That is, However, this KT method may lead to negative KT, which cannot fully realize the benefits of the synergy between similar tasks. For example, when performing KT from the ith task to the jth task in U, where D i < D j , redundant dimensions of the individual of the ith task (i.e., the last (D j − D i ) dimensions) could be transferred to the individual of the jth task. These redundant dimensions have no effect on calculating the fitness of the ith task, which can be poorly optimized. Therefore, transferring these redundant dimensions can lead to negative KT. To handle the MTOP-DD, a KT method that works on the original search spaces of the source and target tasks rather than the unified search space is needed. To enable such a KT method, the CTM strategy is proposed.
Moreover, it is known that different dimensions of two tasks often have different levels of importance and similarity. Most of the existing crossover-based KT methods use universal crossover probability to transfer knowledge. Therefore, some low-quality information (e.g., dimensions with a large gap in the global optima of the two tasks) of the source task may also be transferred to the target task. This kind of trivial crossover can severely break the building blocks and lead to the deterioration of the performance of EMTO algorithms. To handle this issue, the OT and CDT methods that are aware of the importance and similarity of dimensions are proposed.
Furthermore, the issue of "when to transfer" should also be considered to improve the KT in EMTO algorithms. Motivated by this, a simple and efficient adaptive control mechanism of the occurrence of the KT is proposed in this article.

D. Orthogonal Experimental Design
OED is a method for the experimental design of multiple factors with multiple levels. The objective of the OED is to find the best combination of the levels associated with different factors, and the design corresponding to this combination can lead to the best performance according to the scientific or engineering requirement. The OED has enjoyed great success in many engineering applications [54]. The key idea of OED is to select representative combinations based on orthogonality and run experiments on these combinations to determine the possible best combination. Compared to exhaustive methods, which run all the possible combinations in exponential order, the OED requires fewer experiments to predict a high-quality combination. Hence, the OED is an economic and efficient experimental design method. Basically, the OED method contains two main components: 1) orthogonal array (OA) and 2) factor analysis. Suppose that there are D factors and that each factor has two available levels in the experiment. Then, there are a total of 2 D possible experimental settings. To use the OED method, we obtain the OA by the process proposed in [55]. The OA is a predefined M × D table, denoted by L M (2 D ), where M = 2 log 2 (D+1) . Each row in the OA represents a combination of the level of factors, and each element in the table is denoted as OA i,j , whose value represents the selected level of factor j for the ith combination. In this twolevel example, each element OA i,j in the table can have a value of one or two, and each column of the OA contains an equal number of one or two. Notably, M is much smaller than 2 D when D is sufficiently large. For details of the construction process of the OA, refer to Section A in the supplementary material. Then, we conduct experiments on M combinations according to the OA and obtain the experimental results of these combinations. After M experiments of combinations are conducted, we perform the factor analysis to analyze the level effects of each factor and build the best combination from the experimental results. Let f i denote the experimental result of the ith combination and S jk denote the effect of the kth level of the jth factor; then, S jk is calculated as where I(·) is a function that returns 1 if the condition in parentheses is satisfied and 0 otherwise. After all the S jk values are calculated, we can derive the predictive best combination by selecting the level k for each factor j that results in the best S jk . Taking the minimization optimization problem as an example, the jth factor of the derived combination is the level k having the smallest S jk .
OED has been widely used in the traditional EC area (e.g., single-task optimization) to improve search performance. Leung and Wang [56] proposed using OED to help the population initialization and to improve the crossover operator to enhance the robustness of the EC algorithm. Ho et al. [57] proposed using OED to find better combinations of the partial vectors between the two individuals generated by cognitive learning and social learning, which could deal with highdimensional optimization problems. Zhan et al. [55] proposed an orthogonal learning particle swarm optimization (OLPSO) algorithm, which used OED to construct a better guiding exemplar from the global best individual and the personal best individual to solve the complex single-objective optimization in a single-task manner and achieved encouraging results. Later, the orthogonal learning strategy has become a promising approach for enhancing the global search ability of various EC algorithms [58], [59]. However, most of the researches adopt the OED to improve the search performance for single-task optimization. To the best of our knowledge, no research attention has been given to using OED to perform cross-task KT in multitask optimization. Inspired by the success of the OED in improving single-task optimization performance, we propose a novel OT method to transfer knowledge between tasks to benefit the search process.

III. OTMTO
In this section, the multipopulation framework for solving MTOP-DD is first introduced. Afterward, the CTM strategy operating on the original search space is introduced, followed by the details of the OT method. Then, the CDT method that further exploits the similarity between the global optima of the tasks is presented. Then, the complete OTMTO algorithm is given, followed by the time complexity analysis.

A. Multipopulation Framework
Since the proposal of the MFEA that tries to utilize the implicit parallelism of the EC algorithms to solve the MTOP with a single population, many studies have followed up on this work, and many improved versions of MFEA have been proposed. Despite the rationality of MFEA and its variants, this single population-based search paradigm still faces three limitations. First, the scalability from multitask to manytask optimization is restricted, as discussed in [34]. When the number of tasks becomes larger (e.g., more than three), the MFEA becomes less effective. This is due to that the MFEA rarely considers the similarity between the source and target tasks, and some source tasks that differ greatly from the target task can lead to negative KT on the target task. Second, the MFEA performs crossover-based KT on the USR of the individuals between tasks. As mentioned in Section II-A, this can lead to negative transfer since the crossover-based KT may transfer irrelevant dimensions of the source task. Third, the MFEA adopts the implicit KT. That is, the MFEA uses the same evolutionary search operators for multiple tasks within a single population. However, different tasks might require different search operators, and the implicit KT restricts the flexibility of the EMTO algorithms.
Therefore, we adopt the multipopulation framework instead of the single-population framework to address the MTOP-DD in this article. Specifically, we maintain a population for each task. During the evolutionary process, each population evolves independently on each task. One of the most salient features of the multipopulation framework is the allowance of the use of multiple evolutionary operators [60], [61]. That is, we can use different EC algorithms or optimization tools to solve different tasks in different populations while performing KT between/among them. Since different tasks might require different search mechanisms and different intensities of KT from other tasks, the multipopulation framework for solving MTOP-DD is more flexible and general in use. In this article, we implement the multipopulation framework sequentially. That is, in every generation, after the population of one task finishes the evolutionary process, the KT process is carried out. Note that another advantage of the multipopulation framework is that it allows for parallel computing implementation.
Although some multipopulation-based EMTO algorithms have been studied in the literature, our proposed OTMTO algorithm is different from these algorithms in two aspects. First, the OTMTO algorithm is flexible because it not only can handle MTOP with the same dimensionality but also can handle MTOP-DD well. Second, the OTMTO algorithm achieves more effective KT between tasks by both considering the importance and similarity of different dimensions.

B. CTM Strategy
In this section, we consider how to map the global best individual of the jth task (i.e., the source task), denoted as x gb,j , from its search space [0, 1] D j to obtain the mapped global best individual named x mgb,j that is suitable for the ith task (i.e., the target task) in the search space [0, 1] D i . The concept of the KT in optimization originates from the KT in the transfer learning area [7], [27]. In transfer learning, the goal is to improve the learning performance of one task by transferring knowledge from the other task to alleviate the burden of expensive data labeling. The KT in the transfer learning area has enjoyed great success. One of the key issues in transfer learning is domain knowledge alignment, which reduces the bias of the data distribution in the common knowledge space (e.g., latent feature space) and matches data from different domains correctly. Following the idea of domain knowledge alignment, we propose to formulate the CTM process as an optimization problem that aims to minimize the gap between the hypotheses of the target task i and the source task j, denoted as H t and H s . A hypothesis H (i.e., H : X→Y) is a mapping from search space X to objective space Y. In the following, the optimization model in the CTM strategy is introduced in Section III-B1). Then, the design principle of hypothesis alignment and the connection of the optimization model to machine learning are explained and revealed in Section III-B2).
1) Optimization Model: We denote the population for the ith task (i.e., target task) as pop i and the kth individual in pop i as pop k,i , and the sizes of all populations are the same which are denoted as ps. Similarly, the kth individual for the jth task is pop k,j . First, the individuals of two populations are, respectively, sorted from best to worst. We denote the number of best individuals of the two populations selected to construct the mapping as nb. Then, an nb-dimensional temporary feature vector tvec j of the jth task is defined, and the kth (1 ≤ k ≤ nb) dimension of tvec j is the distance between x gb,j and the kth best individual in pop j (i.e., pop k,j ) as Similarly, the nb-dimensional feature vector for the ith task, denoted as fvec i , is also defined, and the kth (1≤k≤nb) dimension of fvec i is the distance between x map and the kth best individual in pop i (i.e., pop k , i) as where x map is a candidate solution vector to be optimized to approach x mgb,j . Then, we perform a scale transformation between two populations to obtain the feature vector for the jth task, denoted as fvec j as where rad i and rad j are the search radii of pop i and pop j , respectively. Specifically, rad i is calculated as where ctr i denotes the arithmetic mean center of the pop i . The calculation of rad j is similar to that of rad i . Then, the optimization model is formulated as Once fvec j is calculated, fvec j is fixed through the optimization process. In contrast, fvec i is calculated according to the candidate solution x map . Note that (5) approximates the scale of the distance metric of the search space [0, 1] D j to the search space [0, 1] D i . In this optimization problem, we directly find a mapped individual x mgb,j in the original search space of the ith task by considering the information of the jth task. Hence, the CTM process can be viewed as extracting the knowledge from the source task. If nb equals 1, the best-mapped individual x mgb,j for this optimization problem is x gb,i , which leads to a zero value of fvec j − fvec i 2 . To bring in diversity and provide x mgb,j with better dimensions for the OT process, we set nb = 5. The fitness of a candidate solution x map to be minimized is named internal fitness and is calculated as An example of the internal fitness calculation process, where the jth task is a 3-dimensional (3-D) optimization problem and the ith task is a 2-dimensional (2-D) optimization problem, is shown in Fig. 1. In this example (nb = 3), to calculate the fitness of a candidate mapped individual x map on the righthand side of Fig. 1, the best nb individuals of the ith task (i.e., target task) and the jth task (i.e., source task) are first selected. Then, fvec j is calculated according to (3) and (5) while fvec i is calculated according to (4). Finally, InternalFitness(x map ) is calculated according to (8).
Since the optimization problem can be complex when nb>1, we use DE/best/1, which is a DE variant with fast convergence ability. The reason for the using DE/best/1 algorithm is that we aim to locate the optimal or near-optimal solution as fast as possible. Suppose that the population size for the internal optimization process is denoted as inps. Then, we use the best inps individuals from the pop i as the initial internal population, denoted as inpop, instead of uniform random initialization in the whole search space. This is because the individuals in pop i tend to have better internal fitness compared to the uniformly sampled ones in the search space. As a result, the search efficiency is enhanced. The termination condition is when the number of internal FE, denoted as INFE, is larger than the predefined maximum number of FE, denoted as maxINFE. After the termination condition is met, we output the individual with the best internal fitness as x mgb,j . The details of the CTM process are shown in Algorithm 1.
2) Explanation for the CTM Mechanism: Recall that the distance-weighted k-nearest neighbor (kNN) [62] is a canonical algorithm for the classification problem. The learned classifier H on the data is a hypothesis. The predicted labelỹ for a solution vector x is Evolve the population for one generation by using DE/best/1 with the internal fitness function of Eq. (8); 5 End While 6 Find the best individual x map with the best internal fitness in the population inpop and output it as x mgb,j ; 7 End where dist(·, ·) is the distance metric of two vectors, N(x) is a set containing k nearest neighbors of x according to a distance metric, and y i (i ∈ {1, . . . , k}) is the label of x i . Note that N(x) ⊂ X, where X is the whole training dataset.
In the proposed optimization model, the predicted labelỹ is the predicted rank of the solution x in the population of a task. Note that there are two learned hypotheses H s and H t , and two training datasets X s and X t for the source and the target tasks, respectively. The two hypotheses aim to correctly predict the rank of solution x on their corresponding tasks based on the training data X s and X t from the two populations. Then, minimizing the difference in fvec j and fvec i in (8), which belong to the source task j and the target task i, respectively, can reduce the difference in weights of the rank prediction processes [i.e., H s (x gb,j ) and H t (x mgb,j )] of the two solutions in (9). In this way, our proposed optimization model in CTM can be regarded as minimizing the gap between the learned H s (x) and H t (x). The aligned hypotheses are special cases of kNN when nb = k and X s contains the best nb solutions in the populations of the source task while X t contains the best nb solutions in the populations of the target task. Thus, by implicitly aligning the hypothesis of the two tasks, the crosstask knowledge from the source task can be transferred to help the search process of the target task.

C. OT Method
The occurrence of the CTM and OT process from the jth task to the ith task is controlled by a probability parameter, denoted as p OT,i,j . Specifically, if the uniformly generated random number within [0,1] is less than p OT,i,j , the CTM is carried out and the OT process is performed. Moreover, the parameter p OT,i,j is adaptively adjusted according to the benefit brought by the OT process. To achieve KT between tasks, we perform the OT method based on the original search space rather than the unified search space. Moreover, to address the difference in the search range of the decision variables (i.e., dimensions) and enable KT, we normalize the dimensions of the candidate solutions to [0,1] according to the lower and upper bounds of the original search space. During evolution, all the individuals of the ith task and the jth task are encoded into the search space [0, 1] D i and [0, 1] D j , respectively. Suppose that we are going to transfer knowledge from the jth task to the ith task, where i = j and 1 ≤ i, j ≤ K. The global best individual of the jth task is selected as transferred knowledge and denoted as x gb,j , which belongs to the normalized search spaces [0, 1] D j . We first execute the CTM process to map x gb,j of the jth task from the search space [0, 1] D j to obtain the mapped individual x mgb,j that belongs to the search space [0, 1] D i of the target task. Note that D i is not necessarily equal to D j . Afterward, OED is performed on x mgb,j and a randomly selected individual from pop i , denoted as x rk,i , where rk ∈ {1, . . . , ps}. Since x mgb,j is constructed according to the information from the jth task, it can be viewed as cross-task knowledge.
When performing OED on the two individuals x rk,i and x mgb,j , a better OT solution, denoted as x OT , is generated to guide the evolution of the target task i. Specifically, the OED method aim to obtain a high-quality solution between two D i -dimensional individuals by discovering the best combination of their dimensions. Therefore, the OED method works on a two-level D i -factor experimental design problem. First, we build an OA L M (2 D i ) by the process introduced in [55]. Second, M individuals are constructed by selecting the corresponding value from x rk,i or x mgb,j according to the OA. That is, for the experimental individual x m (1 ≤ m ≤ M), the kth dimension of x m will be the kth dimension of x gb,i if OA m,k = 1 or will be the kth dimension of x mgb,j if OA m,k = 2. Third, M experimental individuals are evaluated on the ith task, and the best one is denoted as x b . Then, we perform the factor analysis process and derive a predictive best individual x p . After the evaluation of x p , the better individual of x b and x p is set as x OT . Finally, the x OT will be compared with the x rk,i . If x OT is better than x rk,i , x OT will replace x rk,i in pop i . The reward of this OT process, denoted as r OT , is set to 1. Otherwise, r OT is set to 0. Inspired by the reinforcement learning technique that uses the accumulated discounted reward to learn for an optimal behavior policy [63], we design a parameter update mechanism for p OT,i,j as In this way, we can adaptively control the intensity of KT. When the OT method brings in a better transferred individual x OT , the p OT,i,j will be increased, and the OT process will be encouraged. In contrast, the p OT,i,j will be reduced if the OT process cannot offer any benefits. The complete process of the OT method is shown in Algorithm 2.

D. CDT Method
To further exploit the similarity between tasks and obtain knowledge for solving different tasks, we propose the CDT method, which allows different dimensional knowledge from one task to transfer to another task. The occurrence of the CDT process is adaptively controlled by a probability parameter p CDT,i,j the same as the OT process. In the CDT process, from the jth task to the ith task, we calculate the mean and the  N (ctr i,d , std i,d ). Then, we construct the transferred individual, denoted as x CDT , one dimension by one dimension. For the dth (1 ≤ d ≤ D i ) dimension of x CDT , the CDT method will transfer the knowledge from a similar dimension in the jth task (i.e., the source task). To do this, we first calculate the similarity sim k of the kth dimension of the jth task to the dth dimension of the ith task as where KL(A||B) refers to the Kullback-Leibler divergence [64] of two (i.e., A and B) Gaussian distributions and infinitesimal ε = 1e-6 is used to avoid dividing by zero. Then, a D j -dimensional probability vector p rs is created for roulette selection. The kth (1 ≤ k ≤ D j ) dimension of p rs is calculated as Then, a dimension sd in the jth task will be selected in a roulette scheme according to the probability p rs . Afterward, the dth dimension of the x CDT is randomly generated based on the Gaussian distribution N (ctr j,sd , std j,sd ). In this way, the CDT method can perform the information transfer crossing the similar dimensions between the source task and the target task for positive KT. After all the dimensions of x CDT are built, x CDT is evaluated on the ith task (i.e., the target task) and compared with a randomly selected individual x rk,i from pop i . If x CDT is better than x rk,i , x CDT will replace x rk,i in pop i . The reward of this CDT process denoted as r CDT is set to 1. Otherwise, r CDT is set to 0. Finally, similar to the update of p OT,i,j , the parameter p CDT,i,j will be updated by the feedback The complete process of the CDT method is shown in Algorithm 3. Unlike the existing methods, we adopt a roulette selection to select the dimension of the source task that has a similar distribution to the current dimension of the target task. In this way, the CDT method can exploit the similarity between the global optima of the tasks.

E. Complete Algorithm
This section describes the complete procedure of the OTMTO algorithm. The details of the complete algorithm are shown in Algorithm 4. First, we initialize the population that is uniformly sampled from the normalized search space of every task. Then, the populations of the tasks undergo the evolutionary process one by one. Herein, we use the DE/rand/1 algorithm as the problem solver for all tasks. After evolving all the individuals in the population, we perform the CTM strategy and the OT method to transfer knowledge between the source task and the target task which are shown in lines 7-11. Note that rand shown in lines 7 and 12 refers to a randomly generated number within [0,1]. When performing KT from other source tasks, we adopt the simple strategy that randomly chooses one task as the source task, which is shown in line 6 in Algorithm 4. Next, we perform the CDT method which is shown in lines 12-15. A generation is finished when all populations have finished the evolutionary process, the CTM process, the OT process, and the CDT process. The OTMTO algorithm iterates until the maximum number of FE denoted as maxFE is reached. Note that the extra FE caused by the OT method should also be counted and updated, which is shown in line 9 in Algorithm 4. Since different tasks might have DDs, we set maxFE = 1000 × D i . If rand < p OT,i,j 8 Perform CTM strategy between ith task and jth task to obtain x mgb,j ; //Algorithm 1 9 Perform OT method between ith task and jth task to obtain r OT and update FE; //Algorithm 2 10 Update p OT,i,j with r OT by Eq. (11); 11 End If 12 If rand < p CDT,i,j 13 Perform CDT method between ith task and jth task to obtain r CDT and update FE; //Algorithm 3 14 Update p CDT,i,j with r CDT by Eq. (14)

F. Time Complexity Analysis
The computational cost of OTMTO in a generation comes from DE for the optimization of the ith task (i = 1, . . . , K), the INFE caused by an internal DE in the CTM strategy, and the extra FE caused by the OT method and the CDT method. The computational cost of the population reproduction process in DE is ignored since it is small. Since the computational cost of an internal fitness calculation mainly comes from the square calculation shown in (4) and (8), the computational cost of the CTM strategy for the ith task is O(maxINFE · (nb · D + nb)), where D = max ({D 1 , . . . , D K }). Furthermore, the extra FE caused by the OT method are O(2 log 2 (D+1) ) and the extra FE caused by the CDT method is O(1) for the ith task. In summary, the worst-case computational cost (i.e., INFE) of the CTM strategy with K tasks in OTMTO is O(K · maxINFE · (nb · D + nb)). The time complexity of the CTM strategy can be reduced by reducing maxINFE and nb. The experimental study in Section IV-D shows that the performance of OTMTO does not change significantly with different settings of max-INFE and nb. Moreover, the occurrence of the CTM strategy is adaptively controlled by p OT according to the reward of the transfer. The extra FE caused by the OT and the CDT methods are O(K · (2 log 2 (D+1) + 1)). It can be seen that the extra FE grows linearly with the increase of the dimensionality D when D is small (e. g., D < 200). When the extra FE caused by the OT method has become large, the performance of OTMTO may be worse. How to improve the scalability of OTMTO to high-dimensional problems and many-task problems is a future research direction.

IV. EXPERIMENTAL STUDIES
Experimental tests on the MTOP-DD benchmark are carried out in this section to validate the effectiveness and efficiency of the proposed OTMTO algorithm. The performance of OTMTO will be compared with other existing baseline and state-of-theart EMTO algorithms in the literature.
A. Experimental Setting 1) CEC17 Benchmark: To verify the performance of the proposed OTMTO algorithm, we first run a numerical experiment on the CEC17 multitasking benchmark problems, which are widely adopted in benchmarking EMTO algorithms. The CEC17 benchmark includes nine MTOPs, and each problem contains two single-objective optimization tasks. The singleobjective optimization functions include Sphere, Rosenbrock, Schwefel, rotated Ackley, rotated Griewank, and rotated Weierstrass. To rotate the function, the solution should be left multiplied with an orthogonal matrix before the calculation of the function value. The problems are divided into three groups, and in each group, the problems are arranged in the order from lowest to the highest similarity between the two tasks. As introduced in [65], to calculate the intertask similarity, 1e6 points are sampled in the unified search space U, and the ith solution u i is decoded into x 1,i and x 2,j , corresponding to the original search space of tasks 1 and task 2, respectively. Let rank(x 1,i ) and rank(x 2,i ) denote the ranks of the ith solution with respect to tasks 1 and 2. Then, the similarity denoted as sim, measured as Spearman's rank correlation coefficient, is calculated as sim = cov(rank(x 1 ), rank(x 2 )) std(rank(x 1 ))std(rank(x 2 )) . (15) In particular, the first group includes three MTOPs with complete intersections. That is, the optimal solutions of the two tasks are the same in the unified search space. The second group includes three MTOPs with partial intersections in which only a part of the dimensions in the optimal solutions of the two tasks are the same. The third group contains three MTOPs with no intersection in which all dimensions are different in the optimal solutions of the two tasks. Note that most of the problems except problem 6 in the CEC17 benchmark have the same dimensionality for the two tasks. For detailed characteristics of these problems, refer to [65].
2) Proposed MTOP-DD Benchmark: Since the existing benchmark cannot satisfy the demand for testing the performance of the EMTO algorithm on MTOP-DD, we propose a benchmark test suite tailored to MTOP-DD, which is highly configurable. Following the design of CEC17, we consider two kinds of similarity between tasks, i.e., the global optima similarity and the function landscape similarity. Let x * gb,i and x * gb,j be the global optima of the ith task and jth task, respectively. To measure the relative global optima similarity, we denote the number of the same dimensions in x * gb,i and x * gb,j as nsd i,j . Then, the global optima similarity for the ith task relative to the jth task denoted as sim go,i,j , is calculated as Since D i may not be equal to D j , sim go,i,j is not necessarily equal to sim go,j,i . sim go,i,j reflects the potential benefit for the ith task by transferring the knowledge from the jth task. For the function landscape similarity, denoted as sim fl,i,j , we measure it by calculating Spearman's rank correlation coefficient according to (15) by sampling 1e6 points in the unified search space, the same way as [65] does. In particular, the proposed benchmark includes nine two-task optimization problems, where the problems can have DDs. They can be divided into three groups similar to the CEC17 benchmark, with different levels of global optima similarity. In each group, the problems have different fitness landscape similarities. In our experiment, we use the settings D 1 , D 2 ∈ {20, 30, 40, 50}, which constitutes 4×4×9 = 144 multitask problems with different task dimensionalities and different functions to achieve a comprehensive comparison. The detailed data of the benchmark, including the rotation matrix and the optima bias, can be found in the supplementary material. The functions used in the nine problems and the similarities between tasks of the MTOP-DD with D 1 = 30 and D 2 = 40 are shown in Table I as an example.
3) Compared Algorithms: The KT is the key component of the EMTO algorithms. An effective KT method can improve the search performance on multiple tasks compared to singletask optimization algorithms. Therefore, to validate the advantage brought by the proposed KT method, the OTMTO is compared with the single-objective evolutionary algorithm (SOEA) proposed in [65], which is a single-task optimization algorithm. Note that the EC algorithm used in SOEA is DE/rand/1, which is the same as the DE used in the OTMTO algorithm to reflect the effects of the proposed KT methods. The compared SOEA using DE is denoted as SODE. To make a comprehensive comparison with OTMTO, we also implement multiple EMTO algorithms. First, MFEA [22] is regarded as the baseline EMTO algorithm. MFEA is an EMTO algorithm based on the USR and uses implicit KT. Next, multiple state-of-theart EMTO algorithms are implemented and compared with the OTMTO algorithm. MFEA2 [33] is the improved version of MFEA using an online parameter estimation strategy. The EMTO algorithm with explicit autoencoder (EMTEA) proposed in [28] is the compared EMTO algorithm which also uses multipopulation framework and DE/rand/1 as the base solver. Since EMTEA and OTMTO use different mapping methods to facilitate the KT between tasks with heterogeneous search space, the comparison can show the advantage of the proposed CTM strategy and KT methods. Furthermore, the MTGA [25] is a simple and efficient multipopulationbased EMTO algorithm. To enable a fair comparison, all the parameter settings of the compared algorithms are the same as those in their original papers. For the OTMTO algorithm, the experimental settings are as follows.
1) Maximum FEs: maxFE = 1000 × D i . 2) EC Algorithm for Each Task: DE/rand/1, ps = 100, F = 0.5, Cr = 0.6, mutation scheme = rand/1, crossover scheme = binary crossover, and selection scheme = elitism selection. 3) Initial KT Probability: p OT,i,j = p CDT,i,j = 0.5, i = j, and i, j∈{1, 2}. 4) EC Algorithm for the CTM Process: DE/best/1, nb = 5, inps = 50, F = 0.5, Cr = 0.9, mutation scheme = best/1, crossover scheme = binary crossover, selection scheme = elitism selection, and maxINFE = 500. 4) Performance Measure: All algorithms terminate and output the historical best fitness for the two tasks after the maximum FE is reached in a single run. To reduce the bias brought by randomness, all algorithms run 20 times independently, each time with a different random seed. The mean values of the best fitness obtained by the EMTO algorithms over 20 independent runs are used for comparisons. The Wilcoxon rank-sum test at the significance level of 0.05 is carried out on the experimental results. Furthermore, to quantitatively evaluate the performances of the EMTO algorithms, the performance metric in the CEC17 [65] is also used to analyze the experimental results.

B. Results and Comparison
The experimental results of the OTMTO algorithm and the compared EMTO algorithms on the CEC17 multitask optimization benchmark problems are shown in Table II. There are nine two-task optimization problems in total, which add up to 18 tasks. Note that most of the problems in CEC17 are MTOPs with the same dimensionalities except problem 6. The listed results are the mean values over 20 independent runs. Moreover, the symbols "+", "=", and "−" indicate that the OTMTO algorithm is significantly better than, equal to, or significantly worse than the compared EMTO algorithms, respectively. The last row of the table summarizes the total number of obtained "+", "=", and "−" when comparing the OTMTO algorithm with the compared EMTO algorithms. First, we observe that OTMTO surpasses the single-task SODE on almost all tasks, which shows that the proposed EMTO algorithm gives a positive KT. That is, transferring knowledge from other tasks can improve the search performance compared to the manner that optimizes the tasks independently. Moreover, it can be seen that the OTMTO algorithm significantly outperforms the baseline EMTO algorithm MFEA on 16 tasks while performing significantly worse than MFEA on only one task. In addition, the OTMTO algorithm outperforms MFEA2 on 15 tasks. These results show that the OTMTO algorithm offers great advantages over MFEA and MFEA2, indicating that the proposed OT method can significantly improve KT quality. Since the OTMTO algorithm is not the first to adopt the multipopulation framework, it is compared to the MTGA and EMTEA algorithms, which also adopt the multipopulation framework. The comparative results show that the OTMTO algorithm outperforms the EMTEA algorithm on most of the tasks. This indicates the advantage of the CTM strategy and the OT method in performing positive KT compared to the denoising autoencoder. Furthermore, the OTMTO algorithm is compared to the state-of-the-art EMTO algorithm MTGA. The results show that OTMTO significantly outperforms MTGA on 12 tasks and is worse than MTGA on only 1 task. Hence, OTMTO is generally better than MTGA on the CEC17 benchmark problems.
Next, we conduct experiments on the proposed MTOP-DD benchmark that contains 144 multitask problems with different task dimensionalities. Especially, the results on the nine proposed benchmark problems with D 1 = 30 and D 2 = 40 (i.e., the problems in Table I) are shown in  Table III. The results on the benchmark problems with other different dimensionality combinations are all presented in Table S.I to Table S.XV in the supplementary material. Moreover, the comparative results of comparing the OTMTO algorithm with other algorithms using the Wilcoxon ranksum test on the nine proposed benchmark problems with D 1 , D 2 ∈ {20, 30, 40, 50} (i.e., totally 4 × 4 = 16 combinations) are summarized shown in Table S.XVI in the supplementary material.
As can be observed from Table III, the OTMTO algorithm can achieve positive KT on most of the MTOP-DDs compared to the single-task algorithm SODE. Next, the OTMTO algorithm significantly outperforms MFEA, MFEA2, EMTEA, and MTGA on most of the tasks. This shows great improvement brought by the CTM strategy and the OT method. Note that we can observe that the EMTEA perform even worse than the single-task algorithm SODE on some tasks, such as the task 2 of problem 3, which indicates the negative KT. This shows that the existing EMTO algorithms are not well suited for the MTOP-DD and further validates the contribution of this article. The effectiveness of the OTMTO algorithm is also confirmed on the MTOP-DDs with different settings of D 1 and D 2 . From Table S.XVI in the supplementary material, we can see that the OTMTO algorithm outperforms the compared algorithms on most of the problems. This shows that the OTMTO algorithm is an effective EMTO algorithm for handling MTOP-DD. Moreover, the quantitative analysis [65] is provided in Section C in the supplementary material. The conclusion in the analysis is that the OTMTO algorithm achieves the overall best performance among all the EMTO algorithms.
To further show the advantage of OTMTO in search efficiency, the convergence curves of the competing EMTO algorithms on several tested problems are plotted in Fig. 2. It can be observed that the OTMTO can obtain good solutions faster along the search process compared to the state-of-the-art EMTO algorithms. Moreover, the changing transfer probabilities (i.e., p OT and p CDT ) of the OT and CDT methods along the search process are also shown in Fig. 2. It can be observed that the p CDT tends to increase in the early stage and decrease in the late stage of the optimization process. This indicates that the CDT method is useful in the early stage to help locate promising region fast by transferring useful dimensions. Note that the general trend of p OT is increasing in Fig. 2. This indicates that the CTM strategy together with the OT method can always provide positive transfer no matter whether the two tasks of the test problem are with heterogeneous fitness landscapes (e.g., problem 5 of the CEC17 benchmark) or with DDs (e.g., problem 2 of the proposed MTOP-DD benchmark).
Moreover, to testify the scalability of the OTMTO algorithm, we carry out experiments on the three-task optimization problems. The detailed problem settings and the results are given in Table S.XIX in the supplementary material. The results show that our OTMTO algorithm significantly outperforms the SODE algorithm and MFEA.

C. Effects of the Components
In this section, we investigate the effects of the components in the OTMTO algorithm, which are the CTM strategy, the OT method, and the CDT method.
To validate the effectiveness of the CTM strategy, we formulate multiple OTMTO variants that perform CTM by single-layer linearized autoencoder, kernelized autoencoder, and affine transformation, denoted as OTMTO-LA, OTMTO-KA, and OTMTO-AT, respectively. These variants differ from the OTMTO algorithm only in the mapping scheme that maps x gb,j to x mgb,j . Specifically, for OTMTO-LA, the mapping between tasks is implemented by left multiplying by a transformed matrix calculated in the same way as EMTEA. For OTMTO-KA, the mapping is achieved by learning a nonlinear polynomial kernel as [29]. For OTMTO-AT, the mapping is based on an affine transformation as [66]. Note that when the two tasks have different dimensionality, we pad zeros to the solutions of the task with lower dimensionality to satisfy the same dimensionality condition for these mapping methods. Moreover, an OTMTO variant named OTMTO-ITS1 that performs intratask sampling rather than CTM is formulated to investigate the effectiveness of the CTM strategy. That is, the OTMTO-ITS1 is without CTM and the CTM process is replaced by sampling an individual from the Gaussian distribution of the population of the target task, while the other processes including the OED and CDT process remain the same as the OTMTO algorithm.
To validate the effectiveness of the OT method, we also formulate two OTMTO variants that perform simulated binary crossover (SBX) and uniform binary crossover (UBX) to transfer knowledge. They are named OTMTO-SBX and OTMTO-UBX, respectively. In particular, the crossover operation is performed on the mapped individual x mgb,j obtained from the jth task by the CTM strategy and x rk,i for M + 1 times (i.e., same extra FEs) to obtain the best transferred individual similar to the OT method does. Other components, such as CTM and CDT remain unchanged in these variants.
To validate the effectiveness of the CDT method, an OTMTO variant named OTMTO-ITS2 that performs intratask sampling rather than CDT is formulated. In OTMTO-ITS2, the CDT process is replaced by sampling an individual from the Gaussian distribution of the population of the target task and the rest processes, including the CTM and OED processes remain the same as the OTMTO algorithm. Moreover, an OTMTO variant named OTMTO-w/o-CDT where the CDT process is removed is formulated.
The results of the average final fitness and Wilcoxon rank sum test over 20 independent runs are shown in Tables IV  and V. From Table IV, we observe that the OTMTO algorithm generally outperforms the OTMTO variants using other mapping methods like the OTMTO-LA, OTMTO-KA, OTMTO-AT, and OTMTO-ITS1. These results show that the proposed CTM strategy can obtain better transferred individuals for the OT process. From Table V, the OTMTO algorithm performs better than OTMTO-SBX and OTMTO-UBX, which validates the effectiveness of the OT method. Moreover, the OTMTO algorithm performs better than OTMTO-ITS2 and OTMTO-w/o-CDT, which validates the effectiveness of the CDT method.

D. Parameter Sensitivity
In this section, we investigate the sensitivity of the parameter setting in the CTM strategy including nb and maxINFE. In particular, nb controls the quantity of information (number of top individuals) used to reconstruct the individual in the mapped search space, and maxINFE controls the number of internal FE for solving the CTM optimization problem. We run experiments on problems 1, 4, and 7 of the proposed MTOP-DD benchmark in Table I. The parameter settings of OTMTO are nb ∈ {2, 3, 7, 10, 15} and maxINFE ∈ {100, 1000, 2500, 5000}. The experimental results of the average final fitness over 20 independent runs are shown in the form of a heatmap in Fig. 3. It can be seen that on problem 1 with high sim go,1,2 , the OTMTO algorithm tends to obtain better results with a higher nb. When the tasks of the problem have medium or low sim go,1,2 (i.e., problems 4 and 7), the performance of the OTMTO algorithm deteriorates as nb increases. It seems that the parameter setting of nb is related to the similarity of the global optima of the MTOP-DD. For the parameter maxINFE, it can be observed that the OTMTO tends to obtain better results with higher maxINFE (e.g., 5000) compared with the lower maxINFE (e.g., 100) in general. This is because with more INFE we can solve the CTM optimization problem better and then obtain a mapped individual with higher quality. Since the internal optimization process requires extra computational cost, we use the setting of a relatively small maxINFE = 500, which includes only ten internal generations for solving the CTM optimization.

E. Real-World Application Study
To further investigate the performance of the proposed OTMTO algorithm, a real-world application study is carried out. The double pole balancing (DPB) problem is a classic practical application problem for testing the evolutionary learning system. In the DPB problem, the objective of a task is formulated as where R is the accumulated reward of the controller under the environment of the task, τ is the parameter of the task, and π denotes the controller. A double pole controlling task can be identified by the length of the shorter pole l s while the longer one is fixed to 1.0 m. Herein, τ is represented by l s . For the controller π , the input contains six variables including the cart position, the cart velocity, the two rotation angles of the two poles, and the two angular velocities of the two poles. The output of the controller is the force applied to the cart. We use a simple three-layer feedforward neural network as the controller π following the existing work [33]. Note that different hyperparameter settings, such as the number of hidden neurons n hidden in the neural network lead to different controllers with different number of parameters (i.e., dimensionality). For example, n hidden = 8 means that π contains 56 weights to be learned (i.e., 6 inputs × 8 hidden neurons + 8 hidden neurons × 1 output = 56). Therefore, the dimensionality of such an optimization task is 56. In our experiment, we formulate five tasks as follows. 1) T1 : l s = 0.6, n hidden = 8, and dimensionality = 6 × 8 + 8 × 1 = 56. 2) T2 : l s = 0.65, n hidden = 8, and dimensionality = 6 3) T3 : l s = 0.7, n hidden = 8, and dimensionality = 6 × 8 + 8 × 1 = 56. 4) T4 : l s = 0.6, n hidden = 6, and dimensionality = 6 × 6 + 6 × 1 = 42. 5) T5 : l s = 0.65, n hidden = 6, and dimensionality = 6 × 6 + 6 × 1 = 42. For the FE, a solution in OTMTO and other EMTO algorithms is a neural network controller with its weights, and the fitness of the solution is the time that the cart maintains stability by the controller in simulation process. A task is regarded to be solved (i.e., the algorithm is successful) if the controller can be optimized to make the cart maintain stability for more than 30 min in simulated time. Based on the five formulated tasks, we set up six multitask DPB problems with different combinations of the tasks to testify the performance of the EMTO algorithms. The DPB problem instances are denoted as DPB1, DPB2, . . . , and DPB6, as shown in Table VI, with each DPB having two tasks. The EMTO algorithms run on each DPB instance with 50 independent times. The average success rates over the 50 independent runs of the EMTO algorithms are reported in Table VI. The best results are marked in boldface. In most cases, our OTMTO is superior to the SODE and the EMTEA that both use DE as the base solver, not only on the problems whose tasks are with the same dimensionalities but also on those problems whose tasks are with DDs. The proposed OTMTO also outperforms the state-of-the-art MTGA on all problems. The results show the effectiveness of OTMTO on real-world MTOPs in practical application.
To further investigate the time complexity of the OTMTO algorithm, the running time that the EMTO algorithms need to obtain the optimal controller is reported in Table S.XX in the supplementary material. The results in Table S.XX show that our OTMTO algorithm offers advantage in running time. This indicates the proposed methods in the OTMTO algorithm are effective and efficient. Moreover, to testify the scalability of the OTMTO algorithm, we take experiments on a five-task DPB problem that contains all the five tasks T1, T2, T3, T4, and T5. The results are given in Table VII. It can be seen that our OTMTO still outperforms the compared SODE algorithm on four tasks, which shows the positive transfer brought by the proposed methods.

V. CONCLUSION
In this article, we mainly addressed two issues in the design of effective EMTO algorithms for MTOPs. First, the existing KT methods are not well suited to the MTOP-DD, which is quite common in real-world applications. Second, the tasks in MTOP may have different degrees of similarity in different dimensions, and the existing KT methods based on the crossover operator with universal probability can cause the negative transfer. To address the above issues, we proposed the OTMTO algorithm, which includes the CTM strategy and the OT method. The CTM strategy is carried out before the OT method to map the individual from a search space to another search space with DDs. In this way, the OT method can be carried out on the individuals under the same search space to find their best combination of dimensions. We showed that performing the OED process on the CTM-obtained mapped individual of the source task and a random selected individual of the target task is enough to offer a high-quality KT. Moreover, another advantage of the OT is that it requires much fewer FE to find the best combination than the exhaustive search. Furthermore, we proposed the CDT method, which allows the KT between different dimensions of two tasks to improve the KT quality. We verified the effectiveness and efficiency of the proposed OTMTO algorithm on both the commonly used CEC17 MTOP benchmark and the proposed MTOP-DD benchmark. To testify the effectiveness of the proposed OTMTO on a real-world problem, experiments on the DPB application problem were carried out and the results showed the superiority of OTMTO. In the future, we will further study how to improve and extend the OTMTO algorithm to solve many-task optimization problems effectively and efficiently. He is currently a Korea Brain Pool Fellow Professor with Hanyang University, Seoul, South Korea. His current research interests include computational intelligence, cloud computing, operations research, and power electronic circuits. He has published over more than 150 IEEE Transactions papers in his research areas. Dr