Optimal Task and Motion Planning and Execution for Multiagent Systems in Dynamic Environments

Combining symbolic and geometric reasoning in multiagent systems is a challenging task that involves planning, scheduling, and synchronization problems. Existing works overlooked the variability of task duration and geometric feasibility intrinsic to these systems because of the interaction between agents and the environment. We propose a combined task and motion planning approach to optimize the sequencing, assignment, and execution of tasks under temporal and spatial variability. The framework relies on decoupling tasks and actions, where an action is one possible geometric realization of a symbolic task. At the task level, timeline-based planning deals with temporal constraints, duration variability, and synergic assignment of tasks. At the action level, online motion planning plans for the actual movements dealing with environmental changes. We demonstrate the approach’s effectiveness in a collaborative manufacturing scenario, in which a robotic arm and a human worker shall assemble a mosaic in the shortest time possible. Compared with existing works, our approach applies to a broader range of applications and reduces the execution time of the process.


I. INTRODUCTION
H UMAN-ROBOT collaboration (HRC) boosts the flexibil- ity of manufacturing processes, although the inefficient coordination between humans and robots often jeopardizes productivity [1].From a planning perspective, efficiency in HRC is tied to different intertwined problems.First, the system should find a suitable sequence of operations (task planning), assign them to the agents (task assignment), and schedule their execution (scheduling).At run-time, the execution of the operations should be adapted to the human and robot's state (motion planning and replanning).All these steps should also consider the variability of the duration and good (or bad) synergy of simultaneous collaborative operations.The complexity of the overall planning problem limits the effectiveness of existing methods in real-world scenarios, and standardized shared approaches to task and motion planning are still to come.
In this paper, we propose a tiered approach interleaving task planning, scheduling, assignment, and action planning for multi-agent systems.The method addresses the tasks' temporal and geometric uncertainty by decoupling the abstract representation of the task from all its possible realizations.Timeline-based planning reasons on abstract tasks, while online action planning optimizes their geometric implementations.Compared with existing methods, our approach deals with a broader variety of real-world problems and reduces execution and idle times.

A. Related works
This paper deals with interleaving task planning and motion planning and how to consider human behaviors in this process.Existing methods address the first aspect by following the combined task and motion planning (TAMP) paradigm [2].Usually, TAMP provides a task planner able to reason geometrically through calls to a motion planning algorithm.In such a hierarchical approach, a task planning algorithm finds a feasible sequence of actions, and a motion planner checks for geometric feasibility [3]- [7].Most TAMP methods focus on the feasibility of the plan rather than its optimality (except for a few exceptions [8], [9]).Few works address temporal planning to consider task duration [10].
HRC-oriented works usually focus on sub-problems such as scheduling human and robot actions [11]- [13], or cooperative planning at a symbolic level [14]- [16].Few works address TAMP by explicitly modeling the human agent [17], [18].For example, [19] and [20] proposed a hierarchical agentbased task planner, where complex tasks are decomposed into simple actions.The method improves the collaborative experience by considering human preferences as social costs, but throughput-oriented objectives are not considered [21], [22].In manufacturing-oriented methods, [23] optimizes the ergonomics of the human worker by using an online workflow scheduler.[24] and [25] proposed a TAMP framework for planning and executing tasks using first-order logic graphs.A contingent-based approach was proposed in [26], and [27] to deal with uncertainty on the outcome of actions.
The approaches above focus on finding feasible plans and do not consider: (i) process throughput; (ii) temporal constraints and uncertainty; (iii) human-robot synergy.Regarding (i) and (ii), timeline-based task planning [28] has proved to be a powerful approach in many real-world applications [29], [30].The value of this approach consists of integrating planning and scheduling in a unified reasoning framework, making decisions about what actions to perform and when.This approach can also model the features' controllability, i.e., the planner knows whether it can control the task's beginning or the end [31].Timelines were applied to HRC in [32], although the integration with motion planning was inefficient because it reasoned at a low level of abstraction (point-to-point movements).Consequently, motion plans were pre-computed, hindering the flexibility of the approach in dynamic environments.

B. Contribution
As shown in Figure 1, we propose a hierarchical planner where the higher layer reasons over symbolic tasks, optimizing their order, scheduling, and assignment under temporal constraints and duration variability.At the same time, the lower level turns the symbols into robot motions, selecting the optimal task execution among all the possible alternatives.
Our formulation relies on the definition of ACTIONS as a set of instances of a TASK (Section II).Compared to previous works, such formulation explicitly maps a symbolic TASK to all its possible geometric realizations.From the task planning point of view, this formulation considers the duration uncertainty of the tasks, temporal constraints, and the possible synergy of simultaneous tasks performed by the different agents.At the action level, the robot plans and executes the its motions based on the current context.
Then, we propose algorithms to solve the proposed task and action planning problems (Section III).On the one hand, we convert the optimal task planning problem into a multiobjective problem -which constitutes a novelty in timelinebased planning -and use the notion of Pareto optimality to optimize the coupling of simultaneous tasks.On the other hand, we make the optimal action planning problem tractable online by converting it into a multi-goal motion planning problem that can be solved with efficient off-the-shelf algorithms.
Compared to previous works, our formulation is robust to temporal and spatial uncertainty typical of hybrid multiagent systems (e.g., human-robot collaboration).We assess the broad applicability of the approach and its superiority to existing works qualitatively and experimentally (Sections IV and V), showing that our approach applies to a broader range of applications and reduces execution and idle times of the process.A video of the experiments is attached to the manuscript.

A. Definitions and Approach at a Glance
Our approach builds on the following definitions: Definition 1 (Worker): It is an agent that performs a job assigned by the system.It can be a human worker or a robot.
Definition 2 (Task): It is a step of the production process performed by a worker.TASKS do not provide geometrical information on their execution, as they model an operation at a higher level of abstraction.Each TASK might be performed in several ways, making its duration a random variable rather than a scalar variable.
Definition 3 (Robot Configuration): It is a vector collecting the joint positions and the auxiliaries' state 1  realized by an infinite number of possible trajectories that satisfy the physical constraints of the robot.
Definition 5 (Action): An ACTION is a sequence of ROBOT MOVEMENTS.It is a feasible implementation of a TASK.Each ROBOT MOVEMENT is defined by the ACTION parameters 2 .
The proposed hierarchical approach is shown in Figure 1.The task planner uses a planning model of the process to search for a feasible, possibly optimal sequence of TASKS.The planning model specifies the TASKS to be performed with temporal and allocation constraints.The Task Plan Execution module dispatches the target TASK to the workers.The Action Planner converts each TASK into a set of feasible ACTIONS.Then, it chooses the best ACTION among the feasible ones and sends the motion plan to the robot controller.If the Action Planner fails at planning or executing (e.g., no trajectory could be computed or task execution fails at run-time), the Task Planner re-plans according to the Sense-Plan-Act paradigm [33].

B. Model Formalization
The following elements describe a collaborative process: W = {H, R} is the set of workers, i.e., a human and a robot; P = {p i } is the set of production targets; T = {t pi j } is the set of TASKS necessary to carry out a production target p i ; A = {a j } is the set of ACTIONS; D : T → R 2 is the duration function that associates a TASK with an interval [d min , d max ]; T : T → T is the transition function that defines valid transitions among TASKS; γ : T → {c, pc, uc} is the controllability tag.The tag is "controllable", "partially controllable" or "uncontrollable" if the system can decide the execution start and end, only the start or neither of the TASK a worker performs.F : T → {{H}, {R}, {H, R}} is a function that defines for each TASK which worker can execute it; S : A → T is a function that maps each ACTION to its corresponding TASK.SV = (V, T, D, γ) is a "state variable" that describes the behaviors of a domain feature that is represented by the set of tasks V = {v i } ∈ T .Such a set gathers all the valid TASKS that can be executed over time for that specific feature.T , D, γ are defined as above.
is end-time interval, and is the duration interval for task v. FTL SV = {x j } is a flexible timeline of a state variable SV representing a temporal sequence of tokens x j .Solving a collaborative TAMP problem consists of identifying a task plan, a temporal schedule, and an assignment of the tasks considering that each task can be realized by a set of actions and each movement composing an action can be executed by an infinite set of trajectories.Each task plan is modeled through flexible timelines of tokens of state variables.

C. Task planning model for human-robot cooperation
Given the notation in Section II-B, we define the production goals, each worker's possible behaviors, and the synchronization rules to model a human-robot scenario.Refer to [34], [35] for a description of the formalization approach.
First, consider a set of high-level production targets V p = {p i }, where p i ∈ P. Each p i can be further associated with a set V pi = {t pi j } that gathers TASKS to carry out p i , where t pi j ∈ T 3 .Second, we consider a generic worker w k ∈ W that may implement some of the needed tasks in V pi , and denote by V w k = {t pi,w k m } ⊂ V pi the subset of tasks t pi j that a worker w k can do according to F .
Additionally to the set of tasks, we need to define their precedence constraints, task duration, and controllability.
To gather all this information complactly, we denote by SV p = V p , T P , D p , γ p the production state variable associated with the high-level production targets V p = {p i } and by SV pi = V pi , T pi , D pi , γ pi the production state variables associated with the production tasks V pi = {t pi j } necessary for a particular production target p i .The transitions T p and T pi are usually an input of the model, while duration and controllability come from the worker modeling and selection (see below).Then, we denote by SV w k = (V w k , T w k , D w k , γ w k ) as the behavior state variables for the kth worker (often one robot and one human).Transitions T w k , duration D w k , and controllability γ w k are usually input of the model.The duration is an interval that describes the duration uncertainty when worker w k performs t pi,w k m ∈ V w k .Controllability γ w k t pi,w k m depends on the nature of the worker. 3The set V p could be {assembly A, check the quality of B, disassemble C}, and V p 1 = {Take the bottom of part A, pick and places the screws of A, Tight the screws of A}.Task t pi,w k m is uncontrollable if performed by humans since they may refuse to do a task or quit halfway through.Conversely, it is partially controllable if performed by a robots, as they may not be able to finish a task because humans obstruct all possible way-outs.
As mentioned above, much information of the complete model is intertwined.For example, the execution time of the process task t pi j results otherwise, the problem is unfeasible.Similarly, the map of processes tasks t pi j into several feasible workers tasks of t pi,w k i according to the different constraints (precedence constraints, resource constraints, etcetera) is far to be simple.Proper SW tools autonomously compute such information (see Sec. V) that complete the model from user input, using proper synchronization rules R. Given a set of state variables SV P , SV pi , SV w k and synchronization rules R, we can now introduce the timelinebased planning formalization [28], [36].We consider the generic state variable SV g = V g , T g , D g , γ g and denote by We define a flexible timeline FTL SV i of a state variable SV i as a sequence of tokens {x g j } that span the process execution time and describes what the worker does over the process.Finally, a timelinebased plan π consists of a set of flexible timelines, one for each state variable, valid with respect to R. A timeline-based solver exploits different search techniques to find the optimal π among the feasible ones.

D. Action planning model for human-robot cooperation
The action planner finds the best ACTION to execute a given TASK t w k j .To do so, it gathers all the necessary geometric information from the scene descriptor (e.g., through queries to a database).Then, it determines all the actions that can realize task t w k j (i.e., all actions a i ∈ A such that t w k j = S(a i ), according to Definition 5).For example, consider t w k j equal to "pick a blue object and place it in an empty box".The action planner uses the tags "blue object" and "empty box" to identify all the locations of the scene to which these labels are assigned.Then, the action planner finds the best sequence of ROBOT MOVEMENTS that connects those locations in the order specified in Figure 2.
We formulate the action planning problem as the identification of the best path on a directed graph G = (U g , E g ), where U g is the set of vertices and E g is the set of edges, and: • U g,i is a set of ROBOT CONFIGURATIONS, that corresponds to the goal of the ith ROBOT MOVEMENT; • E g,j connects two sets of ROBOT CONFIGURATIONS, i.e., each edge is a set of ROBOT MOVEMENTS.Indeed, U g,i includes all the ROBOT CONFIGURATIONS that are logically equivalent for the task (e.g., each one is the location of equivalent "blue objects").Furthermore, if the representation of the location is in the Cartesian space, many joint configurations may correspond to each Cartesian pose.
For a generic action composed of n movements, the set of vertices is therefore given by where the starting vertex U g,0 is a single ROBOT CONFIGURATION since the starting configuration is usually known.
The action planner shall find the shortest path on G from a vertex in U g,0 to a vertex in U g,n .The advantage of this formulation is that it maps a symbolic task into all its possible realizations.Recalling the example of the blue cubes, the best action is optimal for all possible inverse kinematic solutions of all grasping points of all available blue cubes.

III. OPTIMIZATION AND SOLVERS
This section proposes algorithms to solve task and action planning problems despite the high computational complexity of these problems in real-world scenarios.Without loss of generality, we will refer to the case where the workers are one robot and one human.

A. Optimization of collaborative processes
Consider a set of interaction windows Y = {y j } during which the human and the robot perform some task t p k i associated with a production target p k .Let b H i,j and b R i,j be binary control variables such that: 1, if the human does t p k i during y j 0, otherwise. (1) The robot and the operator can perform only one task during a particular interaction window, namely: Each task can be assigned only once during a process and thus executed only by the human or the robot, i.e., TABLE I: Synergy matrix.Each element s i,j represents the increment or decrement of duration given by the simultaneous execution of tasks i and j.
A duration cost function f d can be therefore defined as: where d H i and d R i are the expected duration of task t p k i ∈ T when performed by the human and by the robot separately.
However, (4) does not consider coupling effects between the robot and the human.For example, if the robot and the human move to the same area concurrently, the robot will either stop or slow down for safety reasons.To capture this synergy, we define ∆d R i,j and ∆d H i,j as where d R i,j is the expected duration of task t p k i performed by the robot while the human is performing t p k j (and vice versa for d H i,j ).The synergy cost function f s is therefore defined as: where s i,j = ∆d R i,j + ∆d H i,j is a synergy coefficient, i.e., an index of the simultaneous tasks coupling, as shown in Table I.
In conclusion, the solution plan π is a solution to the following multi-objective optimization problem: subject to constraints (2) and ( 3).This paper focuses on timeefficiency criteria, but the extension to other objectives is straightforward and does not undermine the search strategy.

B. Task planning as multi-objective search
The synthesis of a plan π uses a domain-independent refinement search, briefly described in Algorithm 1. Timelines are refined iteratively to solve inconsistencies.We detect flaws in the current partial plan at each iteration, select which flaw to solve, and refine the plan by applying possible solutions.Each solution determines an alternative refinement and, thus, an alternative partial plan.Unexplored partial plans compose the fringe of the search space and are collected into a dedicated data structure in case of backtracking.A solution is found when the partial plan extracted from the fringe does not contain flaws.
Three points are crucial in Algorithm 1: first, the selection of the flaw to solve for plan refinement (line 5); second, the selection of the next partial plan (line 8); third, the computation of the objective functions.

Algorithm 1 Timeline-based plan synthesis
Φ ← flaws(π, SV, R) 5: for φ ∈ Φ * do 7: Π ← refine (π, φ.resolvers) π ← choosePlan (Π, S RH ) 9: return π 1) Flaw selection and refinement: we refer to "flaws" as conditions that affect the completeness or validity of timelines.Flaws may concern tokens to be added to timelines (planning flaws) or tokens of a timeline to be ordered because they overlap (scheduling flaws).This choice determines the way the solving process interleaves planning and scheduling decisions.We implement a hierarchy-based heuristic that considers the synchronization rules of a domain specification [37]  4 .
2) Refinement of partial plans: Given the multi-objective nature of the problem, we pursue a Pareto optimality approach [38] and apply the dominance relationship to partial plans.
Definition 6: Given a set {f 1 , . . ., f n } of cost functions, a partial plan π i dominates a partial plan π j (with i = j) if: The dominance condition is used to compare partial plans to heterogeneous objective functions and identify the Pareto set of the search space.The Pareto set comprises partial plans, representing a suitable trade-off between objective functions f k .Such partial plans are compared on a priority assigned to the objectives.In this work, the objectives are makespan, f d , and synergy, f s .Among the dominant plans, we prioritize synergy.
3) Cost estimation of partial plans: The implementation of (7) according to the timeline framework needs some intermediate steps.
The objective function must be composed both of a "cost term" and a "heuristic term".Specifically, the cost term models the scheduled tokens of the timelines of a plan F T L i ∈ π.Instead, the heuristic term considers possible projections of the timelines.A projection ξ i j represents a particular sequence of tokens x k ∈ ξ i j that may complete the timeline F T L i in future refinements of a plan π.All the possible projections of the timeline F T L i define a set Ξ i ξ i j for all j.Therefore, the minimization passes through the computation of partial plans whose timelines are not necessarily complete.
Consider (4), the objective function f d (π) turns in 4 Intuitively, a higher hierarchical level is assigned to SVs appearing in the rules' trigger (i.e., SVs influencing behaviors of other SVs).A lower hierarchical level is assigned to SVs appearing in the rules' body (i.e., SVs whose behavior depends on other SVs).
For each timeline F T L i , the makespan turns into the sum of the duration, d j , of its tokens x j ∈ F T L i with the maximum sum of the duration d k of tokens x k belonging to the projections ξ i j ∈ Ξ i .Consider (6), the objective function f s (π) turns in where F T L R is the robot timeline, Ω(x R i ) = {x H 1 , . . ., x H n } the set of tokens of the human timeline F T L H whose execution may overlap in time with x R i , and the synergy term s i,j = S RH [x R i , x H j ] is extracted from the matrix S RH storing all the synergy terms and it is computed for each pair of overlapping tokens, x R i and x H j .Specifically, the first term of ( 9) is the cost term, while the second one is the heuristic term.This last term considers possible projections of the robot timeline ξ R j ∈ Ξ R as the maximum expected synergy of the plan according to the worst synergy

C. Action planning as a multi-goal motion planning problem
According to Section II-D, the optimal action planning problem consists of finding the shortest path from the current configuration q 0 ∈ U g,0 to a vertex in U g,n on a graph G = (U g , E g ).Each edge in E g corresponds to a motion planning problem between the configurations associated with the edge vertices.
Solving the shortest path on G is often not viable because evaluating all the edge weights is time-consuming (solving a single motion planning problem may take seconds in a realistic scenario).To achieve online implementation, we pursue an approximated approach.We decompose the action planning problem into a sequence of sub-problems, i.e., one sub-problem for each ROBOT MOVEMENT of the ACTION.Then, we optimize the sequence of ROBOT MOVEMENTS step by step.The procedure is described in Algorithm 2.
Procedure getGoalsFromScene gets the ROBOT CONFIGU-RATIONS, U g,1 , . . ., U g,n (line 2), where U g,i is the set of configurations goals associated to the ith movement.Then, the algorithm solve a motion planning problem from q current to U g,i (line 4).The resulting motion plan is a curve σ : [0, 1] → C such that σ(0) = q current and σ(1) ∈ U g,i .The curve σ is appended to the array motion plans (line 8), and the total cost is updated (line 9).Finally, the current configuration is updated with the final configuration of the chosen trajectory (line 10), and the procedure is repeated.
Function getGoalsFromScene queries the database containing all locations in the environment.If the locations are expressed as Cartesian poses, the function converts them into ROBOT CONFIGURATIONS by applying inverse kinematics.Simplifying, the function gets all the configuration goals associated with each movement of an ACTION.
This approximated approach turns a sub-action into a multigoal motion planning problem (line 4), for which efficient solvers exist.Procedure motionPlanning outputs the minimumcost path from a starting configuration to any configuration in V g,i .This problem may be solved by running the motion planner for each configuration in V g,i and by selecting the best Algorithm 2 Refining tasks into motion plans Input: action model, robot tree Output: motion plans 1: q current ← getCurrentConfiguration() 2: (U g,1 , . . ., U g,n ) ←getGoalsFromScene(action model, robot tree) 3: for U g,i in (U g,1 , . . ., U g,n ) do 4: (σ, c) ← motionPlanning robot tree, q current , U g,i

5:
if isEmpty(σ) then q current ← σ(1) 11: return motion plans solution, although this approach being inefficient and does not scale well to the number of goals.Informed sampling-based planners [39] solve the problem efficiently in a single query.

IV. QUALITATIVE ASSESSMENT
Existing TAMP approaches for HRC fit different requirements and assumptions, leading to the lack of a shared standard.A quantitative, fair comparison between existing works is difficult because each method is designed to comply with different constraints.In this section, we resort to examples of real-world problems to argue that our methodology can address a wider variety of cases than existing methods.

A. Case studies
We consider a typical end-of-line packaging application, where two collaborative lightweight robots are installed, and a human operator can access the cell [40].The robots must pick the packs from a ball-transfer table and place them into boxes.Storing packs in the boxes must follow certain rules (e.g., a mosaic composition).If needed, the human inspects the packing quality and handles the packs.The packs on the balltransfer table are randomly positioned.An external camera and an eye-in-hand camera give a raw and refined localization.In this scenario, we analyze the effectiveness of the most relevant works mentioned in Section I-A.The results are summarized in Table II.
1) Temporally bounded process execution: Consider the noncollaborative case (i.e., no human intervention) in which the robot cannot overcome a maximum execution time owing to plant constraints.The methods in Table II that do not model temporal constraints cannot guarantee the plan constraints.This could not be granted even if they integrate optimal robot trajectories and optimal robot actions because action optimality does not imply the plan is minimum-time.
2) Execution in dynamic environments: Consider the noncollaborative case in which the position of the remaining packs can change after each grasping operation.The methods in Table II that do not implement any contingency strategies or online geometric reasoning need replanning when an unforeseen event occurs.Suppose an action takes longer than expected (e.g., the [10] [19], [20] [21], [22] [24], [25] [26], [27] [32] 1 Our approach 1 only offline and if an optimal motion planner is used. camera takes longer to identify the grasping position).In that case, temporal constraints as in [10] may be violated unless the plan is robust to action duration.Conversely, the algorithms with contingency strategies or online geometric reasoning need a local replanning.
3) Execution with multiple agents: In the case that two robots are used, a delay in the execution of a task (e.g., due to slow processing of a sensor, occlusions of a camera, etc.) may introduce synchronization issues between the agents.
All the methods in Table II that integrate neither temporal constraints nor contingency strategies can be highly inefficient in the execution.On the one hand, methods that integrate only contingency strategies could cope with the coordination of multiple agents, but a delay in the task execution may violate precedence constraints.On the other hand, methods that integrate only temporal constraints [10] may fail in the execution when a worker overcomes the time limits due to unmodeled events (e.g., occlusions, the hold of the movement for safety reasons, etc.).Increasing temporal constraints to compensate for this behavior is not helpful since this information is used only in the planning phase and not during the execution.This issue can be mitigated if the temporal modeling used in the plan computation is robust to the task execution latency.
4) Optimal execution with multiple agents: Consider that throughput must be maximized to guarantee the economic return on the robotic cell.The methodologies in [8], [9] compute an optimal initial plan.However, synchronization issues would arise when a task takes longer than expected, as discussed in Section IV-A3.Consequently, a delay in grasping (e.g., due to slow camera perception) leads to an optimality loss or the need to re-compute the whole plan.
5) Optimal execution in dynamic environments: Consider throughput as the goal, and consider that the packs' position can change after each grasping.[8], [9] cannot grant the plan's feasibility because the trajectory must be recomputed online without excessive idle times.Although [32] exploits the timelines, it fails because the optimal plan is computed offline, based on a probabilistic model.The plan computation should be continuously updated to overcome this limitation in parallel with the task execution.However, the latency of the continuous update is high due to the high computational load of the methodology.[24], [25] grant local optimality, but movable objects in the scene doe not allow for global optimization.Finally, [26], [27] embody a contingency strategy to overcome failures related to misalignment between models and reality.These methods do not generalize to temporal constraints and do not allow for trajectory and action optimization.

B. Discussion
According to the analysis above, our TAMP approach is the most adequate to deal with typical real-world requirements.The capability of considering execution issues at both task and action/motion planning levels is a crucial advantage.This capability enables reliable coordination of agents while preserving the optimality of task assignment, action implementation, and the resulting collaboration.
The combination of temporal flexibility, optimal task allocation, and optimal online motion trajectories allows us to preserve coordination and production efficiency through reliable task plans and behaviors.Specifically, timeline-based planning effectively integrates reasoning on allocating tasks to the agents and optimizing production while considering possible deviations at execution time.Furthermore, the action planner evaluates the state of the environment online and computes optimal trajectories of motions on the fly.

V. EXPERIMENTAL ASSESSMENT A. Case Study
We consider a case study derived from the EU-funded project ShareWork (http://www.sharework-project.eu)where a robot arm (Universal Robots UR5 on an actuated linear track) and a human operator have to assemble a mosaic (see Figure 3).We consider four mosaics composed of 4, 9, 16, and 50 cubes of different colors (blue, orange, and white).Each slot has a label given by its column letter and its row number (i.e., A1, A2, . ..).A common condition in HRC is that some operations can be performed only by the robot or the human.In this example, the following allocation constraints are imposed: orange cubes shall be moved only by the robot; white cubes shall be moved only by the human; both can move blue cubes.
1) Process planning model: The case study is traced back to the timeline-based formalism described in Section II-C.The process is modeled as a production state variable SV P = V P , T P , D P , γ P , where each value of V P represents the assembly of a row, i.e., V P = {DoRow 1 , DoRow 2 , . . .}.The precedence of some rows over others may be set at this level as synchronization rules.Two behavior state variables, SV H = V H , T H , D H , γ H and SV R = V R , T R , D R , γ R , model the low-level tasks that the human and the robot can perform.In this case study, the human and the robot perform tasks of the type PickPlace x , which consists of picking a cube and placing it in the slot with label x.Hence, V H = {PickPlace y } and V R = {PickPlace z }, where y is the labels corresponding to white and blue cubes, and z are labels corresponding to orange and blue cubes.Note that V R ∩ V H = ∅ because blue tiles can be assigned to humans and robots.
According to the controllability notion given in Section II-C, the human behavior is modeled as uncontrollable (γ The duration of each task (D R and D H ) is estimated as in [41].This duration estimation method considers the human interference on the robot paths and estimates the duration for all possible obstruction and safety stop cases.Based on this estimation, each element s i,j of the synergy matrix is computed as in (6).
Each task PickPlace z ∈ V R corresponds to a set of actions to be performed by the robot.Each action boils down to a sequence of ROBOT MOVEMENTS, as shown in Figure 2. When the action planner receives a task, the action planner decodes the cube's color and the goal slot from a database.Then, it solves the action planning problem described in Algorithm 2.
2) Software implementation: The task planner in Section III has been integrated into the timeline-based planner PLATINUm [42], using a hierarchy-based heuristics for flaw selection and an HR-balancing search strategy for search expansion [43].The rest of the framework is implemented in ROS and connected to PLATINUm by rosbridge suite [44].PLATINUm dispatches tasks to the robot action planner and waits for feedback.In real-world tests, an HMI system would communicate the task to humans and receive feedback from them.In simulated tests, the human is modeled through a mannequin 5 commanded by a second instance of the action planner used for the robot 6 .
The action planner is implemented by using the library [45], which builds high-level skills on top of MoveIt!.Because of the multi-goal nature of the proposed action planning and the online requirement, we use MI-RRT * [46], a fast variant of Informed-RRT * [39], to solve the motion planning problems.

B. Experiments
We discuss three experiments to evaluate the different components of the proposed approach.The first two analyze the reasoning capabilities of the task planner and action planner alone.The third one then evaluates their integration within the proposed TAMP approach.We conclude the section with a final discussion of the results emphasizing the main advantages and strengths of the approach.A video of the experiments is attached to this manuscript.
1) Experiment 1 (task planning performance): We compare our approach with a timeline-based approach [32] and an actionbased approach [10] (according to Table II, [32] and [10] are the only approaches that can manage temporal constraints).
a) Comparison with timeline-based approaches: [32] has two main limitations.First, the task planner reasons at a lowlevel of abstraction (i.e., each point-to-point movement is modeled as a TASK), putting the task planner in charge of finding the optimal ordering and assignment of all the ROBOT MOVEMENTS.Second, it is based on the a-priori modeling of the trajectories, i.e., all trajectories are pre-computed, and the task planner reasons on the estimated costs of such plans.
To demonstrate that our approach scales better than [32] to complex processes, we consider the mosaics in Figure 3.
[32] models each ROBOT MOVEMENT from a cube to each slot (and vice versa) as a TASK.On the contrary, the proposed method only requires one TASK for each slot.As the planning time roughly grows exponentially in the number of TASKS, the planning time of the proposed method is around one order of magnitude smaller than that of [32], as shown in Figure 4 (right plot).Note that [32] could not solve the 50-cube mosaic within the maximum planning time of 15 minutes.
Similar reasoning holds for the motion planning phase.[32] pre-computes all trajectories from all cubes to all slots and vice versa.Suppose the number of cubes in the scene is equal to the number of slots of the mosaic, and let τ max be the maximum planning time after which a motion planning query is stopped and b the number of slots.Then, the offline phase can take up to 2τ max b 2 seconds.Considering that, in our tests, τ max = 5 s, this corresponds to 90, 360, 1690, and 14440 seconds for each mosaic.On the contrary, our approach computes the trajectories online, dealing with uncertainty and changing goals.
Concerning the execution phase, we compare the performance of the two approaches for the 4-cube, 9-cube, and 16-cube mosaics.Figure 4 shows that the execution time of the proposed method is significantly shorter (-14%, -12%, -20% for the three mosaics) because the action planner chooses the most convenient movement online and based on the current robot and human state.Indeed, the robot's traveled distance is much shorter (-26%, -39%, -27% for the three mosaics).This difference is even more evident when more cubes than necessary are available.For example, we consider the case where 50 cubes are available although only 4, 9, and 16 are necessary (method "proposed w/ all cubes" in Figure 4).Because our method can choose among a broad set of objects at each PickPlace x task, it leads to a further improvement of the execution time (-44%, -22%, -34% for the three mosaics) and the robot's traveled distance (-63%, -39%, -46% for the three mosaics).
b) Comparison with action-based approaches: A direct comparison with other approaches mentioned in Table II is difficult because of the intrinsic differences in terms of planning formalism and models.Nonetheless, we consider [10] as it supports temporal constraints and hierarchical decomposition.The comparison thus focuses on how the two different models deal with uncertain and strict temporal requirements.
[10] proposes an integration of task and motion planning capabilities based on PDDL2.1 [47].It uses an action-based representation with so-called durative actions that do not consider scheduling aspects.PDDL2.1 planners do not pursue makespan optimization but use temporal constraints for plan consistency.Furthermore, neither temporal flexibility nor temporal uncertainty are considered by such planners.Therefore, the task planning models pick-and-place tasks with a fixed duration and consider the robot and the worker as controllable.We run our implementation of [10] for the 4-cube, 9-cube, and 16-cube mosaic and show the results in Table III.As expected, [10] achieves the best planning time for all scenarios because the action-based planner focuses on process decomposition without considering optimization aspects.Scheduling decisions do not impact the reasoning, while the lack of flexibility reduces the number of choices considered during the search.Then, we execute the plans obtained for the 4-cube, 9-cube, and 16-cube mosaic simulating the uncertainty of the worker's actions (δ = ± 5 time units).The objective is to evaluate the reliability of synthesized plans in a realistic scenario where human workers behave uncontrollably.As shown in Table III, the execution time of [10] is greater than the execution of our approach (up to +110%).The lack of temporal flexibility with respect to uncertainty does not allow to deal with the uncontrollable dynamics of the worker effectively.Consequently, the frequent need for re-planning increases the execution time of the plan, leading to less efficient (and effective) collaborations.
2) Experiment 2 (action planning performance): Consider the 50-cube mosaic of Figure 3 and the following actionplanning configurations: i. pre-computed: motion plans from and to each point are computed offline, as in [32].Before each movement, the algorithm chooses the closest goal to the current robot position.Referring to Table II, this approach displays optimal robot trajectories, but neither optimal actions or The action planner always selects the closest goal to the current robot position.Referring to Table II, this approach has optimal robot trajectories and online geometric reasoning, but no optimal actions.iii.multi-goal: the proposed action planning.Motion plans are computed just before their execution.The multi-goal optimal motion planner considers all the goals with the desired properties, yielding optimal robot trajectories, online geometric reasoning, and optimal actions.
We evaluate the following indexes: i) total execution time of the robot tasks (in seconds); ii) joint-space distance traveled by the robot (in radians); iii) planning time for the motion planning algorithm, i.e., the sum of the planning times of all movements to perform a pick-and-place action (in seconds).
We run 20 tests for each configuration by using a task plan generated by the feasible configuration described in Section V-B3 (the chosen plan assigns 25 tasks to the robot and 25 tasks to the human).Results are in Figure 5.The multigoal configuration outperforms the pre-computed and the single-goal variants with a reduction of around 48% in the traveled distance (pre-computed: mean = 1487 rad, stdev = 23.7 rad; single-goal: mean = 1480 rad, stdev = 27.1 rad; multigoal: mean = 768 rad, stdev = 2.21 rad).All configurations use the same optimal path planner; the improvement of the solution is due to the choice of the goal: pre-computed and single-goal direct the search towards the closest goal, which is often sub-optimal.Other heuristics could be adopted to select the goal, but the results would strongly depend on the geometric properties of the workspace.For example, the "closest-one" heuristic would perform even worse in a cluttered environment.On the contrary, multi-goal always finds the best solution regardless of the geometry of the workspace.Shorter paths reduce the execution time, as shown in Figure 5.Note that the difference in the execution time is less pronounced than the one in the traveled distance.The reason is that the execution time also considers planning latency, safety slowdown, and communication overhead, and it is affected by the path parametrization algorithm.Nonetheless, multigoal leads to a significant reduction (around 21%) in the robot execution time (pre-computed: mean = 632 s, stdev = 12.0 s; single-goal: mean = 629 s, stdev = 8.05 s; multi-goal: mean = 500 s, stdev = 1.27 s).Note that the multi-goal approach results in a minor variance of the traveled distance and execution time because the multi-goal search is less affected by the robot state at the beginning of each action.
Finally, the motion planning times are shown in the right plot of Figure 5 (pre-computed is not shown as all paths are computed offline).As expected, multi-goal has planning times higher than single-goal (single-goal: mean = 0.567 s, stdev = 0.057 ms; multi-goal: mean = 2.65 s, stdev = 0.211 s).This difference is intrinsic to the multi-goal nature of the planner, which has to plan toward all available goals.Nonetheless, planning times of multi-goal are still in the order of a few seconds and, therefore, suitable for online planning.Moreover, this discrepancy is expected to decrease thanks to the constant advances in multi-goal motion planning algorithms.
3) Experiment 3 (TAMP performance): Consider the 50cube mosaic of Figure 3 and the following cases: i. feasible: we generate feasible random task plans with respect to allocation and precedence constraints but do not optimize duration and synergy.The number of tasks assigned to the human is chosen as a uniform random variable between 12 and 39 (i.e., the smallest and the largest number of cubes that the human can move).
Referring to Table II, this configuration reproduces those methods that do not implement any task optimization, contingency strategy, or temporal robustness.ii.optimized: we use the proposed multi-objective optimization approach.The task planner decides the order and allocation of the tasks.Both configurations use the proposed action planner to highlight the differences owing to the task plan generation.
For optimized, the task planner decides the number of tasks for each worker in such a way as to minimize the process duration.This is a key feature of our approach, while existing works either assume that the number of assignments is given or they find a feasible assignment, disregarding its optimality.To reproduce this issue, we let the feasible approach randomly decide the number of tasks assigned to each worker as long as the assignment is feasible.where ET R and ET H are the execution time of the robot and the human, and ST is the robot holding time because of safety (i.e., when the human is close to the robot).The first index measures the throughput of the process, and the second and the third measure the quality of collaboration.
We run 20 tests for each configuration; results are in Figure 6.The optimized approach outperforms the feasible one by reducing ET P of around 16% (optimized: mean = 467 s, stdev = 11.9 s; feasible: mean = 557 s, stdev = 39.3 s) and IT of around 95% (optimized: mean = 2.36%, stdev = 1.50%; feasible: mean = 44.2%,stdev = 15.2%), while increasing CT of around 74% (optimized: mean = 79.6%,stdev = 3.26%; feasible: mean = 45.8%,stdev = 12.1%).Note that the optimized approach displays a balanced assignment of tasks to the robot and the human.In this case, the task planner assigns 27 tasks to the human and 23 tasks to the robot, so the two expected makespans are similar.As a result, the execution time and the idle time are shorter.Task planning time [s] baseline [32] proposed Fig. 4: Results of Experiment 1.a.The proposed method (proposed) is compared with [32] (baseline).The proposed method reduces the execution time (left plot), the robot's traveled distance (middle plot), and the planning time of the task planner (right plot).The difference is even more evident when more cubes than strictly necessary are used (proposed with all cubes).

C. Discussion
The outcomes of these experiments confirm the conclusions of the qualitative assessment.The task planning assessment shows that the timeline-based approach achieves a higher level of reliability during plan execution in the case of uncontrollable delays or temporal deviations in the execution of tasks.It also shows that the proposed approach scales to more complex tasks than previous timeline-based methods (Figure 4).
The action planning assessment shows that the dynamic selection of the motion goal significantly increases the flexibility and reliability of robot motions and achieves shorter execution time and robot traveled distance (Figure 5).
The third experiment focuses on integration.It shows the advantages of integrating the two planning approaches.The results of Figure 6 clearly show that the combination of optimal reasoning at the two levels of abstraction significantly improves the synergetic behaviors of the agents in terms of both idle time and concurrency.

VI. CONCLUSIONS AND FUTURE WORKS
This paper proposed a task and motion planning approach for hybrid collaborative processes.The proposed method follows a multi-objective optimization approach to maximize the throughput of the process.We demonstrated the advantages of the method compared with state-of-the-art techniques, both from a qualitative and numerical point of view.Future works will focus on integrating learning techniques to refine the process model through experience and speed up the search for optimal plans [48].For example, [49] presents a preliminary study on learning task duration and human-robot synergy via linear regression.Further investigation will also address the use of other optimization objectives (e.g., taking into account human factors and user preferences [50] or agentagent communication [51]).Finally, real-world tests of human workers will assess the system's performance and dependability.

Fig. 3 :
Fig.3: Case study: a 7-degree-of-freedom robot and a human worker collaborate to assemble a mosaic.Four mosaics are considered, with the number of cubes ranging from 4 to 50.

Fig. 5 :
Fig.5: Results of Experiment 2. The proposed multi-goal action planner is compared with a pre-computed motion planner and a single-goal planner.The path length is reduced (left plot) as the robot execution time (middle plot).Planning times are larger but suitable for online planning (right plot; pre-computed is not shown because it relies on offline planning).

Fig. 6 :
Fig.6: Results of Experiment 3. The proposed method (optimized) is compared with a feasibility-oriented approach (feasible): optimized reduces the process execution time (left plot) and the idle time of human and robot (middle plot) and increases the time the robot and the human work simultaneously (right plot).

TABLE II :
Comparison between the proposed TAMP approach and other works in the literature.

TABLE III :
Results of Experiment 1.2.Comparison of the proposed method and [10].
The following indexes are evaluated: i. process Execution Time, ET P = max(ET R , ET H ) ii. Idle Time IT = 100 |ET R − ET H |/ET P [%] iii.Concurrent working Time of human and robot, CT = 100 • (min(ET R , ET H ) − ST)/ET P [%]