Interactive Imitation Learning of Bimanual Movement Primitives

Performing bimanual tasks with dual robotic setups can drastically increase the impact on industrial and daily life applications. However, performing a bimanual task brings many challenges, like synchronization and coordination of the single-arm policies. This article proposes the Safe, Interactive Movement Primitives Learning (SIMPLe) algorithm, to teach and correct single or dual arm impedance policies directly from human kinesthetic demonstrations. Moreover, it proposes a novel graph encoding of the policy based on Gaussian Process Regression (GPR) where the single-arm motion is guaranteed to converge close to the trajectory and then towards the demonstrated goal. Regulation of the robot stiffness according to the epistemic uncertainty of the policy allows for easily reshaping the motion with human feedback and/or adapting to external perturbations. We tested the SIMPLe algorithm on a real dual-arm setup where the teacher gave separate single-arm demonstrations and then successfully synchronized them only using kinesthetic feedback or where the original bimanual demonstration was locally reshaped to pick a box at a different height.


I. INTRODUCTION
Modern society is faced with the lack of workforce in various repetitive jobs like re-shelving products in supermarkets or handling heavy luggage in airports. Robots appear to be the most promising solution to mitigate the negative effects of the declining workforce and perform these various complex tasks [1]. To work in variable and unstructured environments, robots must be dexterous and intelligent to quickly learn the job while interacting safely with other robots, objects, and humans. However, traditional task-specific robot programming by experts fails to achieve such dexterity and intelligence due to the time-consuming process and poor adaptability of tailored solutions.
Recent advances in machine learning, namely in Learning from Demonstrations (LfD), have enabled robots to learn directly from (non-expert) human demonstrations without needing complex task-specific programming or long and dangerous exploration. Branching from LfD, Interactive Imitation Learning (IIL) approaches [2] allow human teachers to provide interactive demonstrations and corrections to the robot, exploring the advantage that the latter is much more sampleefficient than the former, thus reducing the burden on the human teacher. IIL methods cover many feedback modalities (e.g., correction, evaluative, and qualitative), can be used to learn different models (e.g., policies and objective functions), and leverage several function approximators (e.g., Neural Networks (NNs), Dynamic Movement Primitives (DMPs), Hidden Markov Models (HMMs), and Gausian Processes (GPs)). While tasks which require only one arm have been explored extensively in the literature, more complex tasks which require a bimanual setup have only recently been targeted. Among such tasks, picking large objects in unstructured environments [1], assisting the elderly [3], [4], surgery tasks [5] or complex assembly tasks [6] are shown to require dexterous bimanual setups. Factory assembly, logistics, and household applications of bimanual robots have been known for decades [7], [8].
However, the increased number of Degrees of Freedom (DoFs) (the curse of dimensionality) implies an increased teaching complexity and the necessity of skilled human teachers who knows how to interface with the bimanual robotic platform.
In this paper we contribute with the Safe Interactive Movement Primitive Learning (SIMPLe) algorithm and propose: • The design of a bimanual impedance controller with variable Cartesian stiffness; safety constraints on the maximum applicable force and execution velocity are also formulated; • A novel movement primitive formulation that allows efficiently learning long horizon tasks from a single demonstration and executes the motion in a reactive way; • Efficient corrections of the robot's policy directly from kinesthetic feedback, allowing for fine-tuning the demonstrations. Thanks to this, the user can show single arms' trajectories and fine-tune them when transferring the policies onto a bimanual task. To validate the proposed method, we conducted a series of experiments. The first three are technical experiments related to the main contributions that highlight and test different functionalities of the method. The last two are supplementary user studies to evaluate the type of data input for the proposed by comparing two human demonstration approaches and to evaluate giving corrections compared to giving new demonstrations.
These additional insights can provide a better understanding of the input data generation method and adjustments of the robot's skill for bimanual cases.

II. RELATED WORKS A. Bi-Manual Teaching Frameworks
Like with single-arms, pre-planning and manual coding of multi-arm manipulation is a tedious process. An alternative is learning from human demonstrations, where a user can guide the robot on how to execute the desired tasks. However, when the user controls the dual (or multiple) robot setup, the physical and cognitive load increases drastically. Using priors, shared control or task scaffolding, i.e., dividing the teaching into smaller parts, can substantially decrease the demonstrator workload and make the teaching easier and the learning faster.
Recent works on the control side of bimanual manipulation leverage shared control strategies for reducing the burden of teleoperated bimanual tasks. For example, [9] proposes a shared controller for helping the user to perform bimanual manipulation: it maintains the manipulators' relative position (or orientation) while the user controls the translations or rotations. Similarly, [10] classifies human demonstrations in four teaching modalities: self hand-over, one-hand fixed, onehand seeking, and fixed offset; when performing teleoperation, a trained classifier detects the most likely modality and adapts the constraints of the bimanual controller accordingly.
On the side of shared control, [11] extends the Roboturk platform by having each arm teleoperated by a different teacher, reducing the cognitive load and enabling teaching tasks with more than two arms. Moreover, ongoing research [12] presents a controller which enables inputs from a teleoperating user and local kinesthetic perturbations. In our work, we focus on teaching bimanual policies from a single human teacher by teaching single-arm policies independently and then interactively reshaping them for successful coordination or adaptation to a new scenario. The goal is to enable non-expert users to teach complex bimanual tasks.

B. Bimanual Coordination Policies
During autonomous execution, disturbing one of the arms in a detached bimanual system can break the synchrony of the movements, making it necessary to provide both movement recovery and re-synchronization capabilities. The way the policy is encoded, e.g., time-dependent vs position-dependent, or the chosen function approximation, e.g., a DMPs, HMM, GP [13] can change the disturbance rejection of the robot.
To this end, the method in [14] uses a prior on the relative position of the two manipulators and a timing dependence in the HMM formulation to synchronize the movement of arm manipulators. Other approaches propose to create a "leader and follower" movement by adding a coupling term [15], a regulation term [16], or a deterministic encoding of trajectories with DMPs [17]. Alternatively, the epistemic uncertainty of GPs can be used for switching the behaviour of the arms from follower to leader (and vice-versa) [18]. This leaderfollower learning paradigm makes the system react differently according to which arm is perturbed. Alternatively, the task prior symmetry can be used for easily encoding and synchronizing the task. For example, [19] proposes a bi-manual policy for picking and throwing non-stationary objects by learning a symmetric dynamical system policy. In this case, perturbing any of the two arms would always make the other react.
Other approaches focus on achieving synchrony and coordination by segmenting the trajectories and reproducing them in sequence or according to a hierarchical representation of the task. The advantage of such approaches is that the sequencing provides an implicit synchronization on a higher level, making the lower-level problem easier. A common approach for this scheme is to learn policies for performing pre-defined subtasks, and a higher-level policy which creates a sequence from demonstrations [20], [21]. Alternatively, the task can have a pre-defined structure of sub-tasks based on heuristics, and synchrony is achieved with a sub-task scheduler [22]. Segmentation has also been used for deep-learning bimanual tasks in [23], where lower-level policies are learned for each segment and higher ones for sequencing them. In this direction, [24] proposes a framework for multi-arm task-space control with smooth transitions from independent behaviors, e.g., when reaching goals, to dependent ones, e.g., when performing a dual-arm manipulation.
Our proposed approach differs from the approaches mentioned above in two ways. First, these approaches fall under the LfD category while our proposed SIMPLe framework is an IIL algorithm, and to the best of our knowledge, SIMPLe is the first framework for learning of bimanual tasks from interactive corrections. Second, our interactive framework avoids heuristics for coordinating policies for each arm in a bimanual setup by using human feedback to regulate each arm's dynamics before transferring it to a bimanual policy. Then, when the bimanual policy is executed, the robot's reaction to disturbances depends on the mechanical coupling of the end-effectors (see Section IV-C), or on chosen input state for the policy (see Section III).

C. Motion Stability
The stability of the bimanual operation is another key aspect. When learning from a small amount of data, in particular, the stability of the learned behaviour can be jeopardized when demonstrations are imperfect. In [25], [26], a LfD approach is combined with a learned controller that adapts the motion to keep the learned trajectory stable when facing external forces. In [19], the motion is divided into one Dynamical System (DS) for each sub-goal with a hand-designed vector field that brings the robot always close to the connecting lines of sub-goals. Our proposed Movement Primitives (MPs) have the objective of learning long-horizon MPs with only one final goal and to obtain the stability property as an emerging behaviour of the motion encoding (Section III-C).
Next, Section III introduces the novel GP-based formulation used for modeling MPs, Section IV introduces the proposed SIMPLe algorithm and how we use it for performing interactively learning bimanual MPs, Section V shows different applications and user-cases, and Section VI concludes the article with final remarks and future works.
III. MOVEMENT REPRESENTATION Section III-A presents the proposed Graph Gaussian Process (GGP) formulation, Section III-B the proposed trajectory learning framework and its benefits for safety, Section III-C presents the stability achieved with the proposed framework, and Sections III-D and III-E compare learning trajectories using traditional GPs and the proposed GGPs.

A. Movement Learning with Gaussian Process
To learn the model of the demonstrated trajectories, we chose GPs because it is a flexible non-parametric regression method where the kernel choice can be used to increase the inductive bias on the generalization of unseen points, what is prohibitive using function approximators such as DMPs or NNs. Furthermore, its solid statistical formulation provides both the mean and the epistemic uncertainty of the prediction [13] what can be used for disturbance rejection or stiffness regulation [27].
Given the training data composed by a set of states X and their respective labels Y, the prediction mean and variance at the evaluation point x follow, respectively: where the k = k(x, x) is the variance of a single evaluation point x ∉ X, k ⋆ = k(X, x) is the variance between x and the training inputs X, and K = K(X, X) is the covariance matrix of the training data representing the leaned model [13]. Note that k, k ⋆ , and K are based on the kernel functions and their hyper-parameters, which are used for incorporating prior knowledge into the process. In particular, the kernel determines the interpolation and extrapolation behaviours and when using a distance-based kernel, e.g. Radial Basis Function (RBF), the prediction converges to the mean of the Gaussian Process, usually set to zero. Our objective is to have a mean function that extrapolates without losing the measure of epistemic uncertainty, i.e., does not return a vanishing prediction. For this reason, by correlating with only the closest neighbor, in the dataset and changing the kernel definition to: (3) In simple terms, given a point x i , the correlation is 1 only if that is the maximum obtainable correlation when correlating x i with all x j ∈ X. With the new kernel, the prior covariance matrix becomes: Note that, since the last column is for x j ∉ X, the saturation is not applied. Thus, the resulting prior covariance matrix is not symmetric anymore, making the new process a pseudo-GP. After the conditioning on the data points, the new pseudo-GP posterior becomes: In simple terms,k ⋆ selects as mean the label of the closest point in the database, computing the uncertainty according to the relative position between the query and the selected points.
Additionally, by saturating the covariance matrix K, each trajectory element have its highest correlation with themselves: the new saturated correlation matrix,K, is the identity matrix, thus eliminating the computationally heavy O(n 3 ) matrix inversion. However, with this approximation, we are losing interpolation/smoothing properties. Meaning that the provided trajectory data must be without drastic jumps. In practice, recording trajectories with high enough frequency (> 10Hz) and/or smoothing the data makes the use of the proposed approximation doable. It is worthy to mention that the presented formulation is tailored for the specific application of movement learning and does not necessarily substitute general approximation methods like local models [28] or variational approximations [29]. A detailed comparison between GPs and GGPs for trajectory learning is presented in Section III-D.

B. Representing Trajectories as Graphs
Our goal is to perform safe control during the general or corrective interactions between robots and humans. To that goal, we start from a recorded trajectory demonstration, defined as an array of n end-effector poses ξ = {x 0 , ⋯, x n−1 } ∈ R 3 and the timestamp of each respective pose τ = {t 0 , ⋯, t n−1 } ∈ R, and a final pose and time x n , t n , used to fit a policy π. The trajectory can be seen as a sequence of events, represented as a graph with edges representing transitions from the state at time t i to the state at t i+1 . Given the adopted GP approximation, during the policy execution, the most correlated point is selected on the trajectory, and its label is selected as the goal, see Fig. 2. We denote the policy as a GGP.
However, the input type of the policy can completely change the robot behaviour. For example, a pose-only "feedback" policy, π x ∶ x → x g is a fully reactive policy which computes the next Cartesian pose for the end-effector (x g ), based on the current one (x). Such policies are safer since they make the robot to wait when its path is obstructed and allowing it to rejoin the trajectory on its closest point under perturbations [27]. However, they cannot deal with movement ambiguities and time-dependent movements.
Alternatively, a time-only dependent policy, π t ∶ t → x g , computes (x g ) based on the current time (t). This type of policy can deal with movement ambiguities, e.g., when the demonstrated trajectory crosses itself, and with time-dependent movements, i.e., when the movement has to be temporarily paused at a specific position. However, such "feed-forward" policies are not a safe choice since the attractor moves on the trajectory without considering dangerous interactions with the human and with the environment.
Instead, we proposed the usage of pose and time-belief dependent policies, g , which computes the pose goal and a new time belief (x g , t b g ) based on the current ones (x, t b ). Note that the time-belief is updated with the time of the selected goal in the trajectory. Encoding both pose and time belief allows for obtaining safe policies capable of handling time-dependent movements and ambiguities.
As such, SIMPLe can be used with models fitted as timedependent, pose-dependent, or pose and time-dependent policies by setting the GGP states as respectively, and selecting a kernel for fitting the trajectories w.r.t. time (k(t, τ )), like in [30], position (k(x, ξ)), like in [27], or both of them, as proposed in SIMPLe, which is obtained by multiplying the time and the pose-dependent In the context of trajectory learning, the labels are set as the aggregation states in the demonstration which follow each state in the demonstration, i.e.,

C. Stability Analysis
From this GGP-based formulation, we can also conclude that: Proposition 1. Using the trajectory graph representation, the motion always converge on the proximity of the demonstration and continues towards the end of it.
Proof. Since the vectork ⊺ ⋆ is correlating the current position of the end-effector with only one node of the trajectory, and if there is no overlap on the trajectory, the robot will move towards the goal of the closest node. Then, node by node, it continues towards the end of the trajectory. ∎ A great advantage of the pose and time trajectory encoding is that overlapping is no longer possible as the demonstrator cannot show two different robot positions simultaneously, leading to the absence of overlapping nodes, ambiguities, or undesired loops, guaranteing that the hypothesis in the proof of convergence is satisfied. However, this also means that, when only computing the correlation as a function of position, no physical overlapping of the trajectory can be demonstrated, such as when drawing an eight [31]. Figure 3 shows the different behavior in learning to draw the letter "B" (database from [30]) using a GP and a GGP using only the 2-D position. The first thing to highlight is the effect of the kernel saturation in a faster convergence closer to the trajectory of the GGP compared with the GP. As consequence, when the robot is perturbed, the motion tends to go closer to the trajectory and continue from there. Nevertheless, this difference in the vector fields does not lead to unsafe sudden motions straight towards the attractor due to the proposed attractor and stiffness regularization/saturation described in Section IV-C.

D. Comparison Between GPs and GGPs for Policy Learning
The letter "B" shows a clear ambiguity at the overlapping of the trajectory between the two humps. The robot must first move in and then move out of the intersection on the same line in order to continue towards the end of the trajectory. The learned behavior of the two fitting methods is different. The GP removes the overlapping ambiguity by considering it as noise. This results in cutting the motion without going down to the intersection of the curves, losing tracking accuracy. On the other hand, in the line overlapping, the GGP has a vector field pointing left when approaching from below and to the right when approaching from the top. This may lead to an ambiguous situation that can cause the robot to get stuck locally or, in general, not track the motion correctly. This motivates the use of a position and time-dependent policy, to remove any possible state overlapping.

E. Movement Disambiguation using Pose and Time-Dependent Policy
As explained in Proposition 1, no loops in the chain are allowed to guarantee good trajectory tracking. Thus, our solution is to consider also the time belief (t b ) in the state. Figure 4 shows the evolution of the vector field for different time beliefs. The chain element of the trajectory for the t b indicated above the figure is highlighted with a green dot. From the figure, it is possible to observe how the previously encountered ambiguity is elegantly solved. In fact, the robot gets into the valley and then out without getting stuck.
In order to simulate the behaviors of a GGP with or without a self-update of the time belief, 200 different trajectories are rolled out starting from the origin of the demonstration. In order to take into account the inaccuracy of the low-level (impedance) controller, a Gaussian noise of magnitude 0.01 is added to the attractor when computing the new position. and the tracking is good on average until the start of the two humps intersection, from where the performance degrades due to the ambiguous states. The proposed SIMPLe framework summarized in Algorithm 1 consists of three main parts. First, the human teacher provides kinesthetic demonstrations (Section IV-A), from which a time and position-dependent model (Section III) is learned. Second, the proposed method enables the human to provide demonstrations and to make interactive corrections (Section IV-A), which are leveraged for learning the trajectories and synchronization of bimanual tasks (Section IV-B). And third, the bimanual task can be executed. We employ a Cartesian impedance control to facilitate physical interactions during demonstrations, corrections and autonomous execution (Section IV-C), safety is ensured thanks to the proposed stiffness regulation IV-D and coupling between manipulators IV-E.
Our method aims to enhance the teaching ability of nonexpert users while guaranteeing a safe interaction while teaching, correcting, and executing bimanual tasks. To cope with the complexity of teaching bimanual tasks, SIMPLe provides an interactive kinesthetic teaching (KT) approach allowing to Send(saturate(∆x),K) // see Sec. IV-C 23 end teach one arm at a time and then to teach how to synchronize them using touch by leveraging the time and pose-dependent GGP formulation presented in Section III. To the best of our knowledge, SIMPLe is the first framework to employ IIL on bimanual setups. Nevertheless, SIMPLe does not restrict users from teaching (and correcting) both arms simultaneously, and it can be applied for single-arm manipulation tasks without any loss of generality.

A. Teaching from Kinesthetic Demonstrations and Corrections
LfD allows non-expert users to program robots to perform complex tasks without any programming knowledge. Different interfaces can be used to transfer data to the robot, such as teleoperation devices, touch screens or physical interaction with the robot's embodiment, obtaining a KT approach. When the user is teaching a task, the stiffness and damping of the Cartesian impedance controller are set to zero, allowing the user to easily move the robot. The positions ξ and times τ of the demonstrated trajectories are recorded, and their respective goals, ξ d and τ d are obtained by shifting ξ and τ forward in time (Alg. 1, lines 1 to 4).
After learning the motion from a kinesthetic demonstration, the user can reshape the trajectory of each arm to achieve, for example, coordination between the arms in the execution of the task. Given the Cartesian impedance controller (see Section IV-C) kinesthetic corrections can be performed by simply appling an external force . Such a controller allows for the human to be in full control if the stiffness is set to zero, or the robot can gradually increase its control by regulating the stiffness.
Additionally, given the time and pose-dependent policy (see Sec. III), the demonstrator can also drag the robot forward or backwards in time along its trajectory. This property can be used, for example, to make the execution of the initial demonstration faster [32], [33], to make the robot throw objects [19], or for synchronization learning, as proposed in this paper.

B. Interactive Learning of Bimanual Tasks
When teaching bimanual tasks, it is not always easy or feasible to provide kinesthetic demonstrations with both arms simultaneously, especially when using large redundant manipulators. Additionally, even when skilled users are able to teach a bimanual task by moving each end-effector with a single hand, they may perform a sub-optimal trajectory, or an ineffective one, given the task complexity.
In SIMPLe, the movement of each arm can be executed independently according to the GGP formulation described in Section III.
The proposed interactive learning method offers many possibilities for non-expert users to teach complex bimanual tasks. For example, they can demonstrate the movement for picking up a box one arm at a time and then learn to coordinate the two independent trajectories and apply enough pressure on the sides of the box to execute the task successfully. Moreover, learning repetitive tasks like object hand-over can also be initially demonstrated one arm at a time and later use kinesthetic corrections to learn how to coordinate both arms. Thanks to the calculation of the model as a function of position and time (belief), the user can also bring the robot back to the start of the trajectory and teach (with minimum interaction effort) to perform the task multiple times. Figure 6 illustrates the employed Cartesian impedance controller in each of the two manipulators, which emulates the behavior of a mass-spring-damper system: where Λ(q) is the Cartesian inertia matrix of the physical system, K and D are stiffness and damping that symmetric, positive-definite matrices; f ext is the total external force, and ∆x = x g − x is the distance between the goal and end-effector poses. The damping matrix can be designed to simulate a critical damping system [34]. In this framework, after computing the orthogonal decomposition of K, i.e., K = RKR T 1 , whereK is a diagonal matrix, thenD = 2K 1/2 and D = RDR T . Please notice thatK is a diagonal matrix, hence the square root is applied to every element of the diagonal. The Cartesian impedance controller takes as input a stiffness matrix and a displacement vector (lines 12 and 22 in Algorithm 1); in order to enhance safety when interacting with humans [35], it is necessary to saturate the attractor displacement and the stiffness to a maximum safe value. To help the bounds definition, we can compute them as a function of the desired maximum free-movement velocity (v max ) and maximum applicable static force of the end-effector(F max ) (in absolute values).

C. Safe Cartesian Impedance Control
First, we compute an upper bound for the maximum displacement. Considering Equation (7), when the robot is in free-movement, i.e., f ext = 0, the maximum velocity happens forẍ = 0, that is to say: Thus, given the current setted stiffness K and the desired max allowed velocity v max , ∆x needs to respect: obtained after using the definition of damping. Before sending to the robot, the ∆x is saturated in order to respect the upper bound. However, if taking into account the maximum static force (F max ) whenẋ = 0 andẍ = 0, an upper bound on the stiffness can be found, such that: Hence, since the matrixK is diagonal, we can find the upper bound of each element in the i-th row and column (K ii ) as: so, in every singular component, the value of the principal stiffness is saturated in order to respect the found inequality.

D. Stiffness Regulation
Regulating the stiffness can be used to incrementally increase the stiffness after each demonstration, reducing human control as the learned movement is interactively refined [36]. Alternatively, the stiffness can be regulated when perceiving strong external forces, as a disagreement detection [37]. Similarly, [32] proposed a variation of a DMP where the robot variable stiffness and the regressor phase are modulated to adapt to human kinesthetic demonstrations.
When more demonstrations are provided, the measure of aleatoric uncertainty, i.e., variability in the demonstration, can be used to regulate the tracking stiffness of the robot [38]. Differently, we propose to exploit the epistemic uncertainty quantification of the policy (σ), enabling for automatically regulating the Cartesian impedance controller's stiffness, hence switching control between robot and human.
Mathematically, (15) where the σ tr is the uncertainty threshold that is used to detect the disagreement. Note that σ(x) goes from 0 when close to the trajectory, to 1 when at infinite distance from it. Thanks to this stiffness regulation, when the robot is dragged in regions of high uncertainties, it mitigates the external force applied to the user perturbing the trajectory. This behaviour can be conceptualized as the robot's non-verbal teaching request or repositioning into regions closer to the demonstration.

E. Dual Cartesian Impedance Control
Differently from the execution of a single-arm, when a twoarms policy execution is performed, extra attention is required regarding the mechanical coupling of the movement. For example, when picking up a box with two hands and executing a re-shelving operation, in case of a perturbation of one arm, the other arm must also follow the perturbed movement. In this case, both arms must be mechanically coupled, meaning that in the impedance control of each arm, we would add an extra coupling force defined as: where ∆x des rel = x des r − x des l is the desired distance from the two end-effectors controlled by the SIMPLe algorithm. A simple schematic visualization of the proposed bimanual impedance control is displayed in Fig. 6 where each endeffector is coupled with a stiffness (and damper) with respect to their goal but also with a relative stiffness (and damper) between them.
Note that the proposed safety saturation and regulation process described, respectively, in Section IV-C and Equation (15) are applied on a per-arm basis, thus being applied to single-arm setups. For a bimanual setup, the displacement and stiffness for the coupling forces (F c , defined in Equations (16) and (17)) are saturated and regulated similarly to Equations (9), (14), and (15).

V. REAL ROBOT VALIDATION
We performed the experiments with two 7-DoF Franka-Emika Panda placed vertically on a table and with the same orientation. The impedance control was implemented 2 as described in Section IV. Each manipulator had a shared memory of their Cartesian poses, allowing the calculation of the mechanical coupling force. The experiments presented in Sections V-D, V-A, and V-B were performed using a custom 3D-printed plate end-effector depicted in Fig. 10, which features a layer of soft form for reducing the interaction forces during impacts with objects as in [39]; the experiment presented in Section V-C was performed using the Franka gripper. The impedance control framework, written in C++ makes use of Robot Operating System (ROS) to interface with Alg. 1, written in Python.
We perform 5 experiments with the real robot setup: i) The interactive synchronization of the picking motion of a bottle crate when the demonstration is provided separately for each robot, showing how SIMPLe is used to learn a bimanual synchronization, ii) the interactive correction in picking a different crate compared to the one of the original demonstration, showing how to use the GGP formulation to modify the motion locally, iii) a handover task, where one robot picks and places an object and the other robot picks it from the other's goal location and places it at another position, showing the ability to restart the execution of a trajectory simply dragging the robot at the starting location, iv) a supplementary user study to compare teleoperation and KT, the two most common types of demonstration approaches, v) a supplementary user study to compare giving interactive corrections to giving new demonstrations.
The first three are technical experiments to highlight and validate different functionalities of the proposed method. Each experiment was conducted in 5 trials, and for each of them, the final learned motion was performed 5 times after demonstration and correction(s). This approach allowed for the assessment of the reliability of the learned skill. The last two are supplementary user studies to evaluate the type of data input for the proposed by comparing two human demonstration approaches and to evaluate giving corrections compared to giving new demonstrations. These additional insights can provide a better understanding of the input data generation method and adjustments of the robot's skill for bimanual cases. For all the experiments, we used a position-time kernel for the GGP that computes the correlations and updates the time beliefs online. We use a negative exponential kernel, i.e.
, with a length scale of 0.05 m for the space correlation and 0.05 s for the time correlation. The sigma threshold is set to σ(λ) which is the uncertainty when the closest point is at distance λ. The Cartesian stiffness is kept to 600 N/m for linear stiffness and 30 Nm/rad for rotational. The attractor distance is saturated at 0.05 m, implying that the expected maximum applicable force is 30 N in every linear Cartesian direction and the maximum expected linear velocity is ≈ 0.6 m/s in every linear direction. The rotation delta is saturated at 0.15 rad, implying a maximum torque of 4.5 Nm in every rotational component and a maximum velocity of ≈ 0.4 rad/s. The coupling stiffness is set to 800 N/m in the linear components and 0 for the rotational ones. The relative error is also saturated at 0.05 m. A video of the experiments can be found at: https://youtu.be/GasxgbJZHdQ.

A. Asynchronous Crate Picking
When a pianist approaches studying a new piece, they do it one hand at a time. After mastering the movement with each hand, they start learning how to successfully coordinate the combined execution. Inspired by this idea, in this validation experiment, the user is asked to demonstrate how to best pick a crate, first with the right and then with the left manipulator. However, when the independently learned behaviours were executed with SIMPLe the coordination was off, and the handling of the crate was not stable. In Section IV-B, we highlighted how user feedback can be used to reshape the trajectory and that the reactive formulation of SIMPLe makes the trajectory to "virtually" stop: this feature can be used to learn a bimanual task while simply coordinating the separately recorded policies, see Figure 7.
The effect of the human input can be appreciated in Figure  7. The original demonstrations are represented by dashed lines. Even if the movement of the two demonstrations looks correctly symmetric with respect to the y-plane, the right arm is slower. However, it can be noticed how, after only one correction round, the motion of the two demonstrations is synchronized, as depicted with a solid line. Given the perfect obtained synchronization, in the next round, the user focused on increasing the applied pressure on the side of the crate to increase the grasp reliability. In the 5 experiment repetitions, the user consistently provided necessary synchronization corrections. One trial had an additional correction round, and two trials had two extra correction rounds. After the interactive correction rounds, the robot always placed the crate correctly. The Cartesian error of the final crate position with respect to the final round of correction, considering 25 repetitions (5 executions x 5 trials), has a mean of 0.021 m and a standard deviation of 0.009 m.

B. Synchronous Crate Picking
In this experiment, we focused on successfully teaching the same task of picking a box but giving bimanual demonstrations and corrections. In particular, we showed that even giving only one bimanual demonstration with a few rounds of corrections, the task execution was successful. We also tested the possibility of locally modifying the original policy to pick a different box placed at a higher level. Figure 8 highlights how the robot can be dragged higher sooner, at around 10 seconds, and how, after picking the crate, the robots follow the original policy, being able to place the crate and go to resting position autonomously. In the 5 experiment repetitions, in the first two trials, the user provided two rounds of correction, but only one in the last three. The final position error of the box has a mean of 0.005 m and a standard deviation of 0.004 m. It is important to notice that even knowing the box's position, the motion's generalization in a task-parameterized approach is not trivial. In fact, the policy would have to move with respect to the picking frame and then, after a successful pick, switch with respect to the goal crate. This logic has shown to be successfully implemented in [33] but also to be a source of generalization ambiguities [40]. In general, performing a shared controlled teaching, with the user only taking control locally, can drastically reduce the burden of giving new complete demonstrations.

C. Object Hand-over
Another example of a tedious task is repetitive demonstrations: being able to demonstrate the task only once and then interactively assemble a long trajectory allows the teaching of complex bimanual coordination tasks, like stirring a coffee mug [41] or learning a handover task. To validate SIMPLe in this circumstance, we taught the right arm to pick up a box and place it on the central separation line between the two robots. Then, the left arm would pick up the box and place it in its front. The goal is to show how dragging the robot around can be used for re-synchronization or local trajectory reshaping and also as a movement "reset".
The original demonstrations are displayed with a dashed line in Figure 9. When executing the motion with SIMPLe, the human can safely apply a force on the robot to stop its execution or drag it around on another desired position of the motion. At the beginning of Figure 9, a force is applied to the left manipulator (highlighted by a red circle) to temporally stop it from moving, allowing the right arm to successfully pick a box and place it on the center line. At the moment that the user releases the robot, it is free to move and can pick up the box and reach its goal. To allow the repetition of the motion, the user applies a larger external force (observable with peaks), causing a drop in stiffness since the robot is probably dragged into a region of space with a lower correlation according to (15). Every time the robot finishes its pick and place task, if the user is willing to repeat it, they only have to drag the robot to the desired position of the trajectory. The user is teaching the motion multiple times, as reported with colored patches in the figure. We measure the final error in placing the box after the handover, executed 5 times in 5 different demonstration trials. The mean error and standard deviation are 0.011 m and 0.008 m, respectively.

D. User study: Teleoperation vs. Kinesthetic Teaching
The algorithm itself works with different data from different types of demonstrations. However, since obtained input data Fig. 7. Interactive synchronization of a bimanual picking task. The dashed lines are the demonstrations recorded in the independent demonstration phase a) and b). Since they are not perfectly synchronized, the autonomous execution would fail, hence, the human feedback in c) allows a successful synchronization, depicted with solid lines. Fig. 8. Use interactive learning to teach the robot how to modify the original trajectory so the robot can learn how to pick a crate that is at a different height. depends on the type of demonstration, the demonstration method is an essential part of the whole framework. Therefore, we conduct a supplementary user study to provide additional insight into the effects of the demonstration method to compare the two most common demonstration approaches: teleoperation and kinesthetic guidance. There are studies comparing both teaching approaches, but they were conducted for a single arm [42], [43]. The study in this paper looks into this subject from a bimanual perspective.
Section II highlighted how different works focus on enhancing the teleoperation ability of non-expert users using assistive techniques like shared autonomy [9]- [11]. Since SIMPLe works with both teleoperated and kinesthetic demonstrations/corrections, we wanted to study which is more userfriendly. Although, getting the true answer is not easy: the teleoperation device can have a strong influence, as well as the dimension of the robot or the requested task. For the conducted user study, we asked 7 non-expert users to perform a relatively simple task: pick a box and stack it on top of another. These 7 users were all male and with ages ranging 23 and 40 years old. In order to mitigate the learning bias from the results, participants had a familiarization phase for each teaching modality, in which they could restart the teaching session up to 5 times. For every new participant, the first teaching modality was alternated between teleoperated and kinesthetic, to remove the bias due to their familiarization with the task.
For metrics, we measured the success rate in solving the task and the total teaching time for each method. For subjective analysis, we asked the participants to complete a NASA TLX questionnaire. We conducted a paired samples t-test to verify if the time to do KT is significantly shorter than for teleoperation with the 6D mice. However, 3 people out of 7 failed to perform successful teleoperation, because they did not manage to coordinate well, making the robot self-collide or reach a joint limit. Therefore, we set as failure time the maximum time of the non-failing ones. The test showed that KT requires less time compared to the teleoperation with the given hardware with the difference being statistically significant (p < 0.05). Figure 10 illustrates the average NASA TLX scores among the different users. We can observe that teleoperation resulted in being more mentally demanding and frustrating to perform. In general, we could observe that users tend to focus on teleoperating one arm at a time, making handling the box impossible. When providing KT, the physical contact with the robot helps them to understand the best trajectory better and to accomplish the task successfully.

E. User study: Corrections vs. New Demonstration
Besides the input data generation method, another key factor related to bimanual manipulation teaching is how humans correct existing skills and what is their preference between correcting or giving a new demonstration. To test this, 12 non-expert users participated in an experiment structured as follows: The user was asked to demonstrate the task of placing a box on the crate. The demonstration was then shown to the user after an offset was applied to the initial position of the box. The user was then tasked with kinesthetically correcting the initial policy to account for the change in the initial position. This was repeated two times for different initial positions of the box. The user now should have a sufficient understanding of what it means to give a demonstration or correction.
The second part of the experiment was designed to find the user preference for increasing lengths of the demonstration. The user was tasked with first demonstrating the task of placing the box on the crate. After the demonstration, an offset was applied to the box and the user was given the choice to either correct or re-demonstrate given the new initial condition. For the second iteration, the task remained the same with the additional requirement that after picking up the box, before placing it on the crate, the user has to move the box through a different location as a waypoint. This was done to artificially lengthen the demonstration. Once again an offset was applied to the initial position of the box and the user was given the choice between correcting or redemonstrating. This was done one last time with two waypoints.
Given the choice, out of the 12 participants, 11, 8, and 10 chose to adjust the policy with the interactive corrections for the experiment with zero, one, and two waypoints, respectively, rather than providing new demonstrations. Thus, only in 7 out of the 36 trials, a new demonstration was preferred, which indicates a strong preference for interactive corrections. Afterwards, to evaluate their experience they were asked to answer several Likert scale questions related to user perception of corrected skill and their physical/mental load. The results can be seen in Table I, where the number in each cell represents the number of participants that choose a particular agreement on the Likert scale.
The users found that both new demonstrations and corrections were effective at improving the robot's task. The users were split on whether the bimanual demonstrations are tedious. In general, they found interactive corrections more physically demanding than providing new demonstrations, probably because the robots were already performing movements rather than being completely compliant during new demonstrations. During the experiments, it was observed that people that were shorter, had smaller hands, or were less muscular, tended to struggle more with correcting a policy. Those participants thus might have preferred giving a new demonstration over a cor- rection. However, the users perceived interactive corrections as slightly less mentally demanding, probably because they needed to pay attention only to specific segments as opposed to the whole task.

VI. CONCLUSION
This paper contributes to the field of bimanual manipulation with an interactive kinesthetic learning framework named SIMPLe. It uses a novel formulation of GP, named GGP, that is computationally efficient and ensures local and global stability of the motion while retaining an estimation of epistemic uncertainties. Thanks to the kernel formulation, the policy encoding can go from purely time-dependent to purely position-dependent or to a combination of both. At the same time, the graph representation of it allows an online update of the time belief that, differently from the robot position, cannot be directly measured. The study reports a comparison of a GP with the novel GGP, see Figure 3 and an ablation study when the time dependence is considered or not, see Figure 5. We conclude that considering the time and properly updating its beliefs allows dealing with more complex and possibly ambiguous demonstrations.
Various technical validation experiments were performed on a real bimanual setup to demonstrate the key functionalities and capabilities of the proposed method. The supplementary user studies gave interesting insights into how humans feel when teaching and correcting a robot with different modalities. Our study reported that users are faster and less stressed when performing kinesthetic teaching compared to teleoperation. Furthermore, most users prefer giving corrections to completely new demonstrations.