Adaptive Multi-Task Human-Robot Interaction Based on Human Behavioral Intention

Learning from demonstrations with Probabilistic Movement Primitives (ProMPs) has been widely used in robot skill learning, especially in human-robot collaboration. Although ProMP has been extended to multi-task situations inspired by the Gaussian mixture model, it still treats each task independently. ProMP ignores the common scenario that robots conduct adaptive switching of the collaborative tasks in order to align with the instantaneous change of human intention. To solve this problem, we proposed an alternate learning-based parameter estimation method and an empirical minimum variation-based decomposition strategy with projection points, combining with linear interpolation strategy for weights, based on a Gaussian mixture model framework. Alternate learning of weights and parameters in multi-task ProMP (MTProMP) allows the robot to obtain a smooth composite trajectory planning which crosses expected via points. Decomposition strategy reflects how the desired via point state is projected onto the individual ProMP component, rendering the minimum total sum of deviations between each projection point with the respective prior. Linear interpolation is used to adjust the weights among sequential via points automatically. The proposed method and strategy are successfully extended to multi-task interaction ProMPs (MTiProMP). With MTProMP and MTiProMP, the robot can be applied to multiple tasks in industrial factories and collaborate with the worker to switch from one task to another according to changing intentions of the human. Classical via points trajectory planning experiments and human-robot collaboration experiments are performed on the Sawyer robot. The results of experiments show that MTProMP and MTiProMP with the proposed method and strategy perform better.


I. INTRODUCTION
Robots are utilized in diverse occasions, not only in factories but also in homes, hospitals, etc. On these occasions, robots are usually capable of performing various tasks according to commands and context. Meanwhile, there is a trend that people enter the working environment of robots and interact with them [1]. These all indicate the urgent need for rapid robot-skill learning methods since it endows robots the ability of skill acquisition and refining akin to man. Learning from demonstrations (LfD) is a significant research paradigm in fast robot-skill learning and has been attracting widespread interest in the field of human-robot collaboration [2]. Movement Primitives (MPs) are mature methods for modeling The associate editor coordinating the review of this manuscript and approving it for publication was Li He . robot movement policies in LfD. MPs were widely deployed in manipulating [3], motion [4], human-robot collaboration [5], and other robot tasks due to its generalization to new tasks, temporal modulation of the movement, co-activation of multiple primitives, and other good properties.
A probabilistic formulation of the MPs (Probabilistic Movement Primitives, ProMPs) was proposed in [6]. The modulation of a movement can be achieved by controlling the target's positions and velocities with ProMP. It couples the multiple joints of the robot by modeling the covariance between trajectories of several degrees of freedom. Maeda et al. [7], [8] utilized Interaction ProMP, abbreviated as iProMP, to build the joint distributions between the man and the robot. IProMP can guide the robot to execute the specific task on the condition that the trajectories of a human's multiple joints with upper limb are observed. However, iProMP is unable to be applied to multi-task occasions. Taking this into account, Khoramshahi and Billard [9] proposed weighted dynamic system (DS) models with which each DS model represents an independent task. A cost function was employed to update the weight of each task during the collaboration. As a result, the robot can switch task by contact force between the human and the robot.
Many researchers also have studied task switching from other perspectives. Suzuki et al. [10] utilized two DNNs for handling sensory and instruction signals to enable the switching between subtask dynamics. Through the switching between four subtasks, the robot can complete the task of folding clothes. Mohseni-Kabir and Veloso [11] formalize the task-switching problem as an MDP, and apply the DQN to approximate a task-switching policy. The core of the proposed is to identify the task-switching stimuli. However, there involved somewhat complicate networks, and robots could not comply with the change of human intention instantaneously.
Under the non-contact collaboration, Maeda et al. [12] proposed a mixture of iProMP. The mixture of iProMP first selects the most likely task based on the trajectory of the collaborator's hand and then performs the collaboration like what it does in a single-task case. However, it has a fatal flaw that it cannot switch from one task to another when the collaborator changes his behavioral intention. Rueckert et al. [13] found that a given set of tasks share much structure, and the parameters of ProMP depend on a lower-dimensional latent control variable. The proposed latent manifold ProMPs(LMProMPs) with hierarchical priors can deal with the problems with a high dimension. LMProMPs can also be applied for efficient generalization in multi-task learning, but it still did not settle the task-switching problem. Thus, it is a significant research fields that endowing robots the ability to switch tasks to comply with the changes of human behavioral intentions under contact-free collaborations' scenarios.
In this paper, we proposed an alternate learning parameter estimation method and projection points strategy for MTProMP model and MTiProMP model, which are inspired by Gaussian Mixture Model (GMM) and linear basis function fitting. In MTProMP and MTiProMP, each ProMP model or iProMP model represents a separate task. MTProMP is designed for robots that conduct multiple tasks, and MTiProMP is suitable for multi-task human-robot collaboration. During the demonstration learning phase, MTProMP builds the relationship between each ProMP, and MTiProMP establishes the relationship between each iProMP. During the predicting phase, the proposed alternate learning method based on the EM algorithm [14] is utilized to update the weight and parameter of each ProMP or iProMP. They are applied to implement fast, adaptive task switching in single-robot scenarios and human-robot collaboration scenarios, respectively.

II. PRELIMINARY
In this section, we briefly introduce the structure of ProMP and iProMP and their parameter estimation methods.

A. THE STRUCTURE OF PROMP
The ProMP model is Bayesian linear regression combined with Gaussian basis functions, and it is a movement primitive describing a trajectory in terms of a probability distribution. The form of ProMP is shown below where q t includes joint position q t and velocityq t at time t. q t can be extended, not limited to joint position and velocity. Paraschos [15] extended the state q t = [q t ,q t ] T to q t = [q t ,q t , u t ] T , and u t is the control of robot. In this paper, q t will be extended to q t = [x t , y t , z t ] T , which denotes the Cartesian coordinates. ϕ t ∈ R N is the Gaussian basis function matrix (N is the number of basis functions) at time t, andφ t ∈ R N is the corresponding derivative. w ∈ R N is the parameter of ProMP, and it represents the weight of each basis function.
∼ N (0, q ) is gaussian noise with zero mean. Considering the problem with Q degrees of freedom (DoF), the form of ProMP can be expressed as where y t ∈ R 2Q is the observation variable with Q DoFs, q j,t are position and velocity of j th DoFs, H T t = diag( T 1,t , · · · , T Q,t ) is an observation matrix, and ω ∈ R NQ is the parameter of multivariate ProMP. ProMP model is regarded as a kind of Policy Representation in the framework of Imitation Learning (IL), so the parameter ω could be learned from demonstrations. For each DoF, the parameter w j can be computed by the least square method. The learned w j is as follows where T j, * = [ T j,1 , · · · , T j,T ] T ∈ R 2T ×N (T is the number of time steps, and T represents the transpose of the matrix), λ is the regularization term, and q j, * = [q j,1 , · · · , q j,T ] T ∈ R 2T . Above all, we can get parameter ω of ProMP.

B. THE STRUCTURE OF IPROMP
The purpose of ProMP is to build relationships between the weights of each basis function, but only one agent has been considered. Unlike ProMP, iProMP encompasses multiple agents such as the human and the robot, connecting each agent's parameters. During the human-robot collaboration, it assumes that the human has P DoFs, and the robot VOLUME 9, 2021 has Q DoFs. The form of iProMP can be described as where the variables with superscript H represent the data of the human and those with superscript R represent the data of the robot.H T t indicates diag( T 1,t , · · · , T P+Q,t ), which is an observation matrix with (P + Q) is the parameter of iProMP, it includes the parameters of human ProMP and the parameters of robot ProMP.
Similar to ProMP, the parameters of j th DoF of the human and the k th DoF of robot can be obtained by equation (5).
where H j, * and R j, * represent the vector of Gaussian basis functions of the human and the robot, respectively.
Finally, we can obtain the parameterω of iProMP from pre-collected demonstrations. As for the observation matrix H T t andH T t , we would like to point out that their values can be precomputed and stored for retrieval when needed.

C. ESTIMATION OF PARAMETERS IN PROMP AND IPROMP
We will introduce the estimation of new parameters in the parameter predicting phase in ProMP and iProMP in this section.
Firstly, taking ProMP as an example, we assume that the parameter ω satisfies gaussian distribution p(ω) = N (ω|µ ω , ω ), and there are M demonstrations. So the mean and covariance of ω is where ω i is the parameter of ProMP corresponding to the i th demonstration.
According to the structure of ProMP, the distribution of y t given ω satisfies p(y t |ω) = N (y t |H T t ω, y ). We construct a new variable x = [ω T , y T t ] T , and x ∼ N (x|µ, ). The mean and the covariance of x are So the probability distribution of y t can be given by When y * t is observed with a variance * y , estimating the parameter of ProMP is to calculate the posterior probabil- Similar to ProMP, when the human state has been observed, y * t has the form which resembles equation (11) as where G is the Kalman Gain, andH (obs) t is distinguished between the observable part of a human and the unobservable part of a robot.
In the process of collaboration, when the human state is observed, we can estimate the expectation of parameter µω, and then calculate the state of the robot, which is utilized to drive the robot.

III. A MIXTURE OF MOVEMENT PRIMITIVES
In this section, we first introduce the defects of ProMP and iProMP. To overcome these defects, we then introduce a mixture of MP models, applying it to ProMP and iProMP, respectively.

A. DEFECTS OF PROMP AND IPROMP
The parameters of both ProMP and iProMP are learned from measurable demonstrations. Empirically, when the variance of demonstrations is too small or large, it is difficult to extract effective features from demonstrations. If the pre-collected demonstrations are close to each other and have a small variance, the demonstrated skill can be learned well, but it is challenging to generalize to the skills outside the training set. If the demonstrations are scattered and have a large deviation, the previous two models (ProMP and iProMP) cannot learn the hidden skills. The explicit result exhibits that the robot cannot pass the set via points accurately with those demonstrations.
One of the trajectory planning experiments in ProMP shows the effect of demonstrations' variance in Fig. 1. In Fig. 1 and the following figures of joint angles, the x-axis represents time steps, each time step represents 0.05 seconds, and the y-axis represents the joint angle, expressed in radians. After learning from demonstrations, we set four via points denoted by blue dots. The new task is to make the trajectory pass these points on time. As expected, the solid black trajectory intersects all the via points. At time step 80, the via point is set outside the variance of demonstrations, which causes the entire planned trajectory to deviate far from the training set. When this planned trajectory is utilized to guide the robot, it leads to a wide range of joint position fluctuations. This phenomenon means the robot is either consuming too much energy or being confronted with the crisis of machine wear. This experiment shows the poor generalization ability of the ProMP method at low variance. The same problem exists in iProMP.

B. MTPROMP AND MTIPROMP
In order to overcome the influence of demonstration variance, we introduce a mixture of MP. Taking the ProMP model as an example, we called the mixture of ProMP as MTProMP (multi-task ProMP), which is weighted by multiple ProMPs, and each ProMP represents one task. During the demonstration learning phase, the Gaussian Mixture Model instead of Gaussian distribution is utilized to model the parameter ω. Each component of GMM represents a single ProMP. Suppose that MTProMP consists of K ProMPs, so the distribution of ω is where z is a K dimensional binary random variable, having a 1-of-K representation in which a particular z k is equal to 1, and all other elements are equal to 0. a (k) = p(z k = 1) is the weight of k th ProMP, µ (k) and (k) are the mean and variance of k th ProMP's parameter. Thus, the purpose of the demonstration learning phase is to train a GMM model of ω.
In the single ProMP case, we estimate the initial parameter with equation (6). However, in MTProMP, the EM algorithm can be an appropriate tool to estimate the parameters. During the E-step of the training phase, we introduce τ ik to represent the probability that i th demonstration belongs to the k th component of GMM, also called responsibility. The derived equation of τ ik is as follows During the M-step of the training process, we update the parameters of the GMM We can obtain the parameters ω of the GMM model by performing E-step and M-step iteratively until convergence. The GMM model of ω divides the original ProMP into K classes, and each sub-ProMP has its weight a (k) , mean µ (k) , and covariance (k) . To summarize, the structure of MTProMP can be described as where a (1:K ) is {a (1) , a (2) , . . . , a (K ) } and ω (1:K ) is {ω (1) , ω (2) , . . . , ω (K ) }. As shown in equation (2), the state y t is completely governed by parameter ω when the basis function matrix H T t is set in advance. We can consider that one ω uniquely determines a ProMP. In MTProMP, it categorizes the original demonstrations into K classes by clustering the parameter ω using the GMM method. In the multi-task case, each class represents a unique task. When the variance of demonstrations is large in one task, each class represents a cluster of a small part of demonstrations with an appropriate covariance. In this way, richer demonstrations could be beneficial to construct the probabilistic model of the robot. The probabilistic graphical model of MTProMP is shown in Fig. 2. For every data point ω m , there is a corresponding latent variable z m . Similar to MTProMP, the strategy of mixture of MP model could be applied to iProMP. We call the new structure as MTiProMP (multi-task iProMP). The structure of MTiProMP can be described as Referring to the relationship between ProMP and iProMP in Section II, the method to learn the parameters a (k) and ω (k) in MTiProMP in the demonstration learning phase can be migrated from MTProMP.

IV. PARAMETERS ESTIMATION BASED ON ALTERNATE LEARNING AND DECOMPOSITION STRATEGY WITH PROJECTION POINTS
In this section, we propose an alternate learning method and decomposition strategy to estimate parameters of MTProMP and MTiProMP in the parameter predicting phase. In the first step, we update the weight a (k) in weight space and update the parameter ω (k) in parameter space of each ProMP or iProMP to pass the decomposed projection points in the second step.

A. ESTIMATION OF PARAMETERS IN MTPROMP
We first discuss MTProMP, and design a trajectory planning task experiment which makes the trajectory pass the specified via points. In Section II-C, with the assumption that ω and y * t obey Gaussian distribution, ProMP and iProMP construct a new variable x = ω T , y T t T , which obeys Gaussian distribution. It can be treated as the joint Gaussian distribution. Via Gaussian inference, the posterior distribution p(ω|y * t ) could be computed. In MTProMP, the parameter ω has been modeled with the GMM; that is, ω is a combination of K ω k s. Therefore, the hypothesis of joint Gaussian distribution does not exist. The estimation method cannot be transferred from ProMP.
According to equation (2), if the parameter ω satisfies the GMM model, state y t , linearly transformed from ω, also obeys the GMM model. We denote the via point at time t as y * t . The ultimate purpose of MTProMP in mathematical form is to maximize the probability p(y * t |a (1:K ) , ω (1:K ) ) by optimizing weight a (1:K ) and parameters ω (1:K ) of each ProMP. In predicting phase, when the via points y * t are observed, most of the effort is to maximize the log-likelihood p(y * t |a (1:K ) , ω (1:K ) ) concerning θ = {a (1:K ) , ω (1:K ) }.
Intuitively, we apply an EM algorithm to estimate the parameters. We introduce a latent variable z = [z 1 , z 2 , . . . , z K ] T , and each z k ∈ {0, 1}. The sum of z k satisfies T k=1 z k = 1, which means that only one z k can take the value 1, others take the value 0. In addition, we use p(z k = 1) = a (k) to represent the probability of k th ProMP, abbreviated as q k (z). Akin to the EM algorithm, we apply the Jenson Inequality for equation (19).
where we denote the lower bound of log-likelihood L(θ) as Q(θ) = K k=1 q k (z) log a (k) p(y t |ω (k) ) q k (z) , and K k=1 q k (z) = 1. When a (k) p(y t |ω (k) ) = c × q k (z) holds, the inequality in equation (20) takes the equal sign, where c is a constant and c = K k=1 a (k) p(y t |ω (k) ). Hence, the distribution q k (z) of latent variable z is as follows In the EM algorithm, it is maximizing the lower bound that guarantees the convergence iteratively. The same strategy is equally effective here. To acquire the maximization of Q(θ), we alternately optimize the weight a (1:K ) and the parameter ω where K k=1 q k (z) log q k (z) is omitted for simplicity. Intuitively, equation (22) can be regarded as maximizing the expectation of log a (k) p(y t |ω (k) ) under the q k (z).
In E-step, we compute the distribution q k (z) of latent variable z with the equation (21), fixing weight a (k) and p(y t |ω (k) ).
In M-step, there are some differences from the traditional EM algorithm. We first update the weight a (k) with the same equation in (16), yet here we only have one observed data point. So M is set to 1, and we get the update equation of the new weight a Then, we update the parameter ω (k) of each ProMP with fixed a (k) to maximize Q(θ). It's straightforward to reckon of forcing every ProMP to pass the via points, which makes each p(y t |ω (k) ) most probable. Consequently, the weighted average expectation Q(θ) could be guaranteed maximum with one iteration. Although this simple strategy is feasible, the requirements are too strict.
According to equation (17), y t obeys the GMM model and can be seen as the weighted average of multiple ProMPs or tasks. From another perspective, if each ProMP is regarded as a basis function, the via points or observed y * t can be viewed as a point in the space supported by multiple ProMPs. Our current strategy is to let each ProMP pass the projection point y (k) t of y * t on each ProMP basis. The most significant task is to pinpoint the projection points y (k) t . In Section III-A, the drawbacks of a single ProMP are analyzed. Generally speaking, we hope that each projection point y (k) t is as close as possible to the mean (y (mean) t ) (k) of each ProMP. That is, under the premise of satisfying the sum of weighted projection passing the set via point, the smaller the cumulative sum of the deviation from projection point y (k) t to the mean (y (mean) t ) (k) , the better decomposition strategy it is. Each ProMP just needs to pass the corresponding projection point y (k) t , which is close to its mean and has a high probability that it falls within the variance range. Fig. 3 shows a schematic diagram of the proposed method, the y-axis does not represent the joint angle. Red X-shape dots are the projection point of the blue via point, and they are close to the corresponding black mean point. This method ensures that each ProMP has a good, smooth planned trajectory with small fluctuation. As a result, the weighted trajectory is smooth when the via point is set between two tasks. In this way, smooth task switching is achieved attained.
As for the scenarios where V via points are required, without loss a generality, a V-dimensional key-value dictionary D V presented as of Y * t 1:V . In contrast to equation (9), the update equation is listed as follows Different from the EM algorithm, we execute only one iteration for the speed of the algorithm, which will not exacerbate the computational burden. The later experiments demonstrate relatively good performance.
Since the coefficient a t j (j ∈ {1, · · · , V }) are well designed at several specific via points, and they are generally different. For the smoothness of the weighted trajectory and avoidance of the sudden change, a linear interpolation strategy is utilized. For two or more tasks, the linear interpolation strategy can easily ensure the sum of weights of different tasks equals 1. Next, we will prove that K k=1 a (k) = 1 at any time step t.
Since there are V via points denoted as D V , without loss of generality, we can select two consecutive via points at time t j and t j+1 . Then we can get the associated wights a c all equal 1.

B. ESTIMATION OF PARAMETERS IN MTIPROMP
We have discussed the parameter estimation of MTProMP in the last section. In this section, we discuss MTiProMP. According to Section III-B and equation (18), The strategy of parameter estimation in MTiProMP can be migrated from MTProMP. The EM algorithm is also utilized here. Different from MTProMP, MTiProMP predicts the robot's state according to the observed human state, which is also called human behavioral intention. The human state is a part of the variablē y t . In E-step, we calculate the distribution q k (z) of latent variable z q k (z) = a (k) p(ȳ t |ω (k) ) K j=1 a (j) p(ȳ t |ω (j) ) .
In M-step, we first estimate the weight a (k) of each iProMP with respect to human behavioral intention. Like equation (21), a (k) is equal to q k (z). Let's assume the observed stateȳ * t = (y * ,H t ) T , (y * ,R t ) T T , where y * ,H t = (q H 1,t ) T , . . . , (q H P,t ) T T and y * ,R t = (0 R 1,t ) T , . . . , (0 R Q,t ) T T .
We then decompose y * ,H t to each iProMP, which is denoted as (y H t ) (k) . The part of robot state remains zero vector because it is unobservable. New observation state for each iProMP is The predicting equation for parameters µ new ω (k) and new ω (k) in each iProMP is as follows

Several via points experiments and human-robot experiments
were conducted with a 7 DoFs Sawyer robot and its camera.
In the first experiment, we applied MTProMP to conduct the task, which let the robot arm to pass the set via points in joint space at the specified time. This experiment show that MTProMP has the capability to switch from task A to task B smoothly. In the second experiment, we apply MTiProMP to perform two human-robot collaboration tasks at the same time. This experiment shows that the robot switches from task A to task B when human switches behavioral intentions from task A to task B. In all the experiments in this paper, we sample 100 time steps with a time interval of 0.05 seconds.

A. PASSING SET VIA POINTS WITH MTPROMP
For simplicity, we first apply MTProMP to one joint to validate the feasibility. The experiment duration is 100 time steps. In Fig. 4, the light red and light green lines represent the demonstrations of task 1 and task 2 respectively, and each task has 26 demonstrations. Red and blue lines represent the mean trajectories of demonstrations. The blue dots represent the set via points at time step 1, 30, 70, 80, 100. The solid lines represent the mean trajectory for each task, and the dash lines represent the planned trajectories with ProMP for each task. Obviously, we can see that the planned trajectories have a very large fluctuation range, which is not good for robots. Fig. 5 to Fig. 7 show the trajectory planning process of MTProMP. The black dots represent the mean joint angles of each task learned from demonstrations at time step t. In Fig. 5, we can see that the blue dot via point at time step 30 is decomposed into two pink X-shape dots, and the black dash weighted average trajectory passes the blue via point successfully. In Fig. 6, the blue dot coincides with one of the pink dots at time step 100. The reason for this situation is that the blue dot is in the demonstrations of task 1 and far from task 2, so the weight a (1) approaches 1, and a (2) approaches 0, which can be clearly seen in Fig. 7. The red and green dots represent the weights of two tasks corresponding to the via points at the same time step, respectively. The weight is only updated at several specific moments. For the smoothness of the weighted trajectory and avoidance of the sudden change, a linear interpolation strategy is utilized to fit the weight curve. For two or more tasks, the linear interpolation strategy can ensure the sum of weights of different tasks equals 1. In summary, with the set via points ranging from task 2 to task 1, the weight a (k) changes, and the main contributor of passing blue via points switches from task 2 to task 1. Comparing the y-axis coordinates of several figures, we notice that the black dash weighted trajectory has a smaller fluctuation range. We also apply MTProMP to the 7-DoF Sawyer robot to conduct the task of passing via points. In Fig. 8, the joints modeled by ProMP cannot pass the two set via-points completely. However, in Fig. 9, the black dash planned trajectories pass all set via points at time step 40 and 100 successfully, which validates the performance of MTProMP. MTProMP eliminates the impact of huge variance brought by demonstrations.

B. TASK-SWITCHING BASED ON HUMAN BEHAVIORAL INTENTION WITH MTIPROMP
In multi-task human-robot collaboration experiments, we adopt the same strategy as section V-A. For simplicity, we first consider the one joint circumstance. Fig. 10 and Fig. 11 show the human-robot collaboration with one joint. In Fig. 10, human first executes task 2 and then start to switch from task 2 to task 1 at time step 35. Finally, human executes task 1 at time step 70. In MTiProMP, the robot trajectory is predicted according to the observed human trajectory. The predicted robot trajectory is shown in Fig. 11. The robot first executes along the green trajectory of task 2 and then switches to the red trajectory of task 1. The task-switching process is steady and smooth, which preliminarily proves the feasibility of MTiProMP in multi-task human-robot collaboration and task switching. Fig. 12 to Fig. 14 depict the experiment that a human does the collaboration task with a 7-DoF Sawyer robot. Fig. 18   demonstrates the real Sawyer robot in collaboration. Here we use the Cartesian coordinates of the collaborator's hand as mentioned in section II-A. In the demonstration learning phase, all demonstrations are collected from the collaboration VOLUME 9, 2021   of the real Sawyer and the human. Then, we estimate the parameters of MTiProMP. At the prediction phase, we ask   the human collaborator to switch from task 2 to task 1. It should be noted that human behavioral intention is limited to the demonstrations. Fig. 12 shows the coordinates of the human hand in Cartesian space, which are estimated by the        [12]) and Fig. 14 (method proposed in this paper) illustrate the angle of each joint in the robot's joint space. Task 1 and task 2 represent the different directions of hand and robot end-effector movements, which share the same initial state. The setting of task 1 and task 2 are shown in Fig. 17 and Fig. 18. Fig. 17 show the simulation of Sawyer in CoppeliaSim, where the Sawyer robot smoothly switches from task 2 to task 1. Fig.19 presents the movement track of the real Sawyer robot's end-effector, a plastic suction cup. The collaboration is that the human hand and the robot end-effector move from respective initial position to the same target position. We can see that the actual human trajectory switches from task 2 to task 1 in Cartesian space, the corresponding task-switching of the robot is reflected in the robot's joint space. The trajectories of robot joint angles are inferred based on the observed human behavioral intention. The trajectories in Fig. 13 have a sudden shift with the switching from task 2 to task 1, which can be considered as a hard switch. Fig. 15 and 16 show the Cartesian coordinates of robot end-effector and human hand. The actual robot trajectory in Fig. 15 has a straight part, the duration of this part is one time step. In a real robot, this is intolerable. However, the trajectory inferred with MTiProMP is smooth, realizing the smooth task-switching.
Combining the above two experiments, we evaluate the performance of MTiProMP in multi-task human-robot collaboration and task switching based on human behavioral intention.

VI. CONCLUSION
We presented MTProMP model for multi-task robot manipulation and MTiProMP model for multi-task human-robot collaboration. We also proposed an alternative learning method and decomposition strategy with projection points, combining with linear interpolation strategy for weights to estimate the weight and parameters in the previous two models. We showed that MTProMP model could effectively overcome the defects of ProMP with a huge variance or two or more tasks, and MTiProMP model is highly competent for task switching in human-robot collaboration. Classic via points experiments and human-robot collaboration experiments with the Sawyer robot have been carried out to prove the feasibility of proposed models and methods.
There are a few interesting future research directions along with this topic. First, In our experiments, a large amount of demonstrations are needed to estimate the parameters of MTProMP and MTiProMP. To improve the data efficiency, we could integrate the idea proposed in [17] with proposed algorithms. A Normal-Wishart prior and a Dirichlet prior will be employed to provide the prior information and reduce the number of the demonstrations. Second, our current innovation in this paper focuses on aspects of the ideas and theory. As for the practical operation, ordinal damped least squares and limiting functions are used for robot singularity avoidance and safe output of physical mechanisms. Although it works well, these are passive implementations, and the active optimization with our proposed algorithms by incorporating them with other restrictions will be further refined in subsequent studies.
JIAN FU (Member, IEEE) received the M.Eng. degree in computer application from the Huazhong University of Science and Technology, Wuhan, China, in 1999, and the Ph.D. degree in control theory and control engineering from the University of Science and Technology Beijing, Beijing, China, in 2006. He is currently an Associate Professor with the School of Automation, Wuhan University of Technology. His current research interests include human-robot interaction, intelligent robot, and reinforcement learning.
JINYU DU received the B.S. degree from Wuhan University of Technology, China, in 2018, where he is currently pursuing the master's degree with the Department of Control Science and Engineering. His research interests include reinforcement learning, human-robot interaction, and data-efficient robot learning.
XIANG TENG received the B.S. degree from East China University of Technology, China, in 2017, and the master's degree from Wuhan University of Technology, China, in 2020. His research interests include deep learning and human-robot interaction.
YUXIANG FU is currently pursuing the B.S. degree in computer science with The University of British Columbia, Vancouver, BC, Canada. His current research interests include machine learning and algorithmic analysis.
LU WU received the M.Eng. degree in circuit and system and the Ph.D. degree in communication and information system from Wuhan University of Technology, Wuhan, China, in 2007 and 2018, respectively. She is currently a Lecturer with the School of Information Engineering, Wuhan University of Technology. Her research interests include machine learning and computer vision. VOLUME 9, 2021