PROTAMP-RRT: A Probabilistic Integrated Task and Motion Planner Based on RRT

Solving complex robot manipulation tasks requires a Task and Motion Planner (TAMP) that searches for a sequence of symbolic actions, i.e. a task plan, and also computes collision-free motion paths. As the task planner and the motion planner are closely interconnected TAMP is considered a challenging problem. In this paper, a Probabilistic Integrated Task and Motion Planner (PROTAMP-RRT) is presented. The proposed method is based on a unified Rapidly-exploring Random Tree (RRT) that operates on both the geometric space and the symbolic space. The RRT is guided by the task plan and it is enhanced with a probabilistic model that estimates the probability of sampling a new robot configuration towards the next sub-goal of the task plan. When the RRT is extended, the probabilistic model is updated alongside. The probabilistic model is used to generate a new task plan if the feasibility of the previous one is unlikely. The performance of PROTAMP-RRT was assessed in simulated pick-and-place tasks, and it was compared against state-of-the-art approaches TM-RRT and Planet, showing better performance.


PROTAMP-RRT: A Probabilistic Integrated Task and Motion Planner Based on RRT
Alessio Saccuti , Graduate Student Member, IEEE, Riccardo Monica , Member, IEEE, and Jacopo Aleotti , Senior Member, IEEE Abstract-Solving complex robot manipulation tasks requires a Task and Motion Planner (TAMP) that searches for a sequence of symbolic actions, i.e. a task plan, and also computes collision-free motion paths.As the task planner and the motion planner are closely interconnected TAMP is considered a challenging problem.In this paper, a Probabilistic Integrated Task and Motion Planner (PROTAMP-RRT) is presented.The proposed method is based on a unified Rapidly-exploring Random Tree (RRT) that operates on both the geometric space and the symbolic space.The RRT is guided by the task plan and it is enhanced with a probabilistic model that estimates the probability of sampling a new robot configuration towards the next sub-goal of the task plan.When the RRT is extended, the probabilistic model is updated alongside.The probabilistic model is used to generate a new task plan if the feasibility of the previous one is unlikely.The performance of PROTAMP-RRT was assessed in simulated pick-and-place tasks, and it was compared against state-of-the-art approaches TM-RRT and Planet, showing better performance.Index Terms-Task and motion planning, manipulation planning.

I. INTRODUCTION
P LANNING sequential robot manipulation tasks depends on both logical relations and geometric constraints.Task And Motion Planning (TAMP) approaches combine a task planner, which searches for a valid sequence of high-level symbolic actions (task plan), and a motion planner, which searches for a collision-free path to reach the goal [1].The symbolic planner uses logical reasoning to determine a sequence of symbolic actions to reach the goal that are exploited to guide the motion planner.On the other hand, the motion planner may discover that some symbolic actions are not feasible due to robot kinematics constraints or collisions with obstacles and, therefore, the exploration of the configuration space can, in turn, affect the symbolic planner.In this paper, we propose a Probabilistic Integrated Task and Motion Planner (PROTAMP-RRT) based on a unified Rapidly-exploring Random Tree, i.e. an RRT that operates on both the geometric space and the symbolic space.The unified RRT algorithm, like a standard RRT motion planner [2], samples robot configurations, and it connects them in a tree data structure.However, as the distance between two configurations can not be easily defined if they refer to different high-level symbolic states, we propose to enhance the RRT with a probabilistic model which is exploited by the task planner to generate a sequence of symbolic actions to reach the goal of the task.Each symbolic action defines a sub-goal configuration of the robot.The RRT expansion is guided towards the sub-goal configuration of the first action in the task plan for which a path has not yet been found.The main contribution is the elaboration of the probabilistic model that evaluates the feasibility of the task plan, by estimating the probability that new collision-free configurations will be sampled.The probabilistic model is updated in the expansion phase of the RRT, and it is used to generate a new task plan if the feasibility of the previous one is unlikely.The probabilistic model takes into account probabilities conditioned on the pose of movable objects, so that the task planner can prevent unfavorable object configurations.PROTAMP-RRT defines symbolic actions using significant object poses, which are geometric poses of movable objects that are also represented as symbols.In order to limit computational load, PROTAMP-RRT starts with few significant object poses, but it can also add more of them if necessary.To do so, PROTAMP-RRT relies on sample generators which are able to extract new significant object poses, and the corresponding robot configurations to grasp and release objects in such poses.For example, if an object must be picked-up from a cupboard and an obstacle obstructs the extraction (Fig. 1) new significant object poses are generated to relocate the obstacle first.We evaluate PROTAMP-RRT on simulated pick-and-place tasks.Results indicate that PROTAMP-RRT outperforms two state-of-the art TAMP approaches: TM-RRT [3] and Planet [4].In summary, the contributions of this work are: 1) a probabilistic feasibility model for TAMP, 2) a unified RRT approach that takes advantage of the probabilistic model and 3) the experimental evaluation against previous works.

II. RELATED WORK
In most TAMP approaches, the task planner generates symbolic plans that are validated using a motion planner.Backtracking occurs when the motion planner fails, and new symbolic plans are generated.In the Task-Motion Kit [5] new symbolic plans are iteratively deepened if the motion planner fails.In [6], sampled symbolic actions are validated by a RRT-Connect motion planner, until the task planner can build a complete symbolic plan.PDDLStream [7] by Garrett et al. introduces streams, i.e. external "black-box" algorithms which add new symbols to the symbolic space.A motion planner is a stream which generates trajectories for abstract actions.These approaches suffer from the fact that sampling-based motion planners, like RRT, may never terminate if the motion planning problem is not feasible.Therefore, these methods either use a simpler motion planner, or determine failures using a timeout, which depends on the complexity of the task.
Our approach of sampling new significant object poses is most similar to PDDLStream [7], where the pose generator is also a stream.Refinement functions in Task-Motion Kit [5] are a related concept, as they convert an abstract action into a geometric goal.Also semantic attachments [8] can produce new values for numeric fluents when actions are evaluated.The difference in our approach is that new significant object poses are sampled independently of existing actions.
As TAMP is computationally intensive, several methods have been proposed to guide task planning without using feedback from motion planning like our method.Heuristics have been proposed to estimate the action feasibility in the task planner, using a database [9], a SVM classifier [10] and reinforcement learning [11].In order to help the task planner with geometric information, scene graphs [12] and neural networks [13] have also been proposed.Xu et al. [14] use deep learning to predict future long-term affordances in the task planner.A few works [15], [16] focused on transferring learned feasibility constraints from previous executions.
Similarly to our work, a few approached have used a unified RRT in combined symbolic and geometric space.The main challenge is guiding the RRT towards the goal, which requires long-horizon planning.In TM-RRT [3] by Caccavale and Finzi, RRT is guided by a distance function that accounts for the cardinality of the difference of two symbolic states.In Planet [4] a unified RRT is guided using a heuristic which takes into account failures due to collisions, but unlike in the proposed PROTAMP-RRT the heuristic does not consider the pose of the movable objects.Another work [17] extracts rich information from the geometric planner as it identifies "culprits", i.e., problematic sequences of actions.However, [17] operates on a discretized environment, and it defines culprits as logical statements.Similarly, [18] defines geometric constraints in a quantized representation of the parameters.Some TAMP approaches focused on object re-arrangement in order to move or remove a single target object [19], [20], [21].These methods can make additional assumptions, such as heuristics based on regions [20], and removal of objects to reduce constraints [21].While our work aims at finding the first geometrically feasible solution, other works [22], [23], [24] focus on finding an optimal trajectory, e.g. in terms of robot execution time, by applying complex optimization approaches.The work in [25] deals with advanced sampling approaches for geometric locations where symbolic transitions may take place.In [26] general sampling-based methods are proposed for TAMP, but unlike our approach it focuses on sampling with dimensionality-reducing constraints.

III. METHOD
A planning problem is defined as the tuple E, O, S, C, M,  A, R, L, G, q 0 , s 0 , where E is the static environment, O is the set of manipulable objects, S is the symbolic state space of object predicates, C ⊆ R robot DoF is the robot configuration space, M are all possible motions connecting two robot configurations q 1 , q 2 ∈C, A is the symbolic action space, L is a set of significant object poses, R is a set of regions where poses in L can be sampled, G is the task goal, q 0 ∈C is the initial robot configuration, s 0 ∈S is the initial symbolic state.A symbolic state s∈S may contain any predicate about objects.However, this work deals with predicates that specify the poses of the objects.The set L contains a finite number of significant poses of objects, i.e. geometric poses that are also represented as symbols in the task planner.In each symbolic state s objects are either assigned to a significant object pose in L, or attached to the robot gripper.When an object is in a significant object pose the symbolic state may also change if the object is grasped or released by the robot.When a grasped object is moved by the robot, in-between poses are not encoded as significant object poses.Each object in O contains information about its geometry, the available grasping poses, and the regions R where it can be placed by the robot, either as intermediate locations or goal locations.Elements of R are continuous regions where placement of objects is allowed, e.g.flat surfaces.A sample generator that adds new significant object poses to L is associated to each region.The generators sample poses independently of actions or of existing object poses by extracting random poses in regions R, with the requirement that at least one inverse kinematic solution exists to reach an object in that pose, but not that the pose is collision-free.Symbolic actions a∈A are operations that change the symbolic state, and in this work involve moving objects between significant poses.Task goal G is a set of conditions that must be satisfied by the final symbolic state.
The flowchart of the proposed planner is shown in Fig. 2. The main component is the RRT-based motion planner, which operates in an unified state space X = C × S (Section III-A).The motion planner samples new unified states, checks them for collisions, and organizes them into a tree data structure T .A probabilistic model P (Section III-B) is updated whenever a new valid unified state is created or discarded due to collisions with the environment or other objects, in order to evaluate the probability that the motion plan will be able to progress towards the sub-goal for each action.If the probability of the current task plan becomes too low, the motion planner triggers the task planner (Section III-C) to obtain a new task plan, i.e. the sequence of actions α best that reaches the goal G with the highest probability.The motion planner descends T until it finds the first unreached sub-goal of α best , and then it extends T towards this sub-goal (Section III-D).

A. Unified State Space
Each symbolic state s∈S is a set of predicates that describe the environment.In this work predicates describe mainly the poses of movable objects.A symbolic state s contains a set of pairs c i in the form o i , l j(i) that associates each manipulable object in o i ∈ O to a (significant) object pose in l j ∈L (except for the grasped object, if any).The task planner operates on S, where symbolic states are connected by abstract actions in A. Each action a∈A has preconditions that must be verified before the action can be applied to the symbolic state, and post-conditions that describe the changes applied to the symbolic state.Postconditions include as a sub-goal the final configuration q end (a) of the robot after the action.Given a sequence of actions α and the initial symbolic state s 0 , postconditions can be applied consecutively to determine a sequence of symbolic states {s 0 , . . ., s i , . . ., s |α| }, where s i is the symbolic state after action a i .Some actions in A are based on action templates which depend on locations and/or objects, such as Move(o, l s , l d ) that moves object o ∈ O from l s to l d , where l s , l d ∈ L are significant object poses.When new significant object poses are sampled by the sample generators, new actions are added to A using the new poses.
Four examples of symbolic states are shown in Fig. 3: Initial symbolic state s 0 and states s 1 , s 2 , and s 3 , where there is a different assignment of each box to the significant object poses l 1 , l 2 , l 3 , l 4 ∈ L. Two actions are defined: action a 1 ∈ A moves Box 1 from significant object pose l 1 to l 2 , and action a 2 moves Box 2 from significant object pose l 3 to l 4 .Symbolic state s 1 is generated by applying a 1 to the initial state s 0 , and symbolic state s 2 by applying a 2 .Both actions in any order lead to the same final symbolic state s 3 .Fig. 3. Example symbolic states generated by applying actions a 1 and a 2 in different order to s 0 .In s 0 (top left), Box 1 is in pose l 1 , and Box 2 is in pose l 3 .Two initially unoccupied poses are also defined: l 2 and l 4 (dashed).
The RRT motion planner operates on a unified state space X = C × S. Each unified state x ∈ X consists of a single robot configuration q ∈ C and a symbolic state s ∈ S. The RRT based algorithm samples unified states x ∈ X and it organizes them in a tree data structure T , with root at the initial unified state x 0 =(q 0 , s 0 ) and whose edges are motions m ∈ M. The distance between two unified states, in the same symbolic state, is computed as the Euclidean distance in the robot joint space.Motions may also include changes in the symbolic state.Therefore, given a motion, the sequence of symbolic actions performed up to that motion can be retrieved by traversing the tree upwards to the root.In practice, as the planner needs to efficiently query the sequence of symbolic actions which lead to each motion, each motion holds a reference to the sequence of actions.

B. Probabilistic Model
The purpose of the probabilistic model P is to evaluate the probability that the RRT planner will be able to progress towards the solution of an action a ∈ A, given the knowledge about past performance of the motion planner on the same action.In particular, the probabilistic model is used to estimate the probability P(a|s) of action a in symbolic state s as the probability of successfully sampling a new robot configuration towards the action sub-goal q end (a).The new configuration is successfully sampled if it does not collide against the static environment or other movable objects.Therefore, under the assumption of independence of the c i effects, P(a|s) may be decomposed as: P(a|s) = P e (a) P obj (a|s) = P e (a) where P e (a) is the probability that the new configuration does not collide with the static environment, and P obj (a|s) is the probability that it does not collide with movable objects.In turn, P obj (a|s) is the product of all P c (a|c i ) where c i = o i , l j(i) is the probability that the new configuration does not collide with object o i if the latter is in pose l j(i) .As P c in (1) depends on the action and only on a single pair c i , it is used to compute P(a|s) for all symbolic states s where c i is true.In summary, the probabilistic model P has three elements for each action a: an environmental probability P e (a), a set of object conditional probabilities P C (a)={P c (a|c i )}, and a set of solved symbolic states P S (a).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
1) Environmental Probability: the environmental probability P e (a) is the probability of not colliding with the static environment when sampling a robot configuration towards the action sub-goal q end (a).The probability is estimated as the fraction of configurations sampled towards the action sub-goal which did not collide with the environment: where N e and D e are two positive integers.Initially, N e (a)=N e 0 , D e (a)=D e 0 , where N e 0 and D e 0 are positive integer constants and N e 0 <D e 0 .Whenever the RRT discards a sampled configuration due to a collision with the static environment E, P e (a) is reduced by incrementing D e .Whenever the RRT successfully samples a configuration towards the action sub-goal q end (a), P e (a) is incremented by incrementing both N e (a) and D e (a).After a successful sampling, if P e (a)< N min D min we reset N e (a)=N min and D e (a)=D min , where N min and D min are constants (N min <D min ).This rule restores the probability to a higher value when the RRT samples a robot configuration towards an action sub-goal after many failures, e.g. when a narrow passage is found after many collisions.
2) Object Conditional Probabilities: the set of object conditional probabilities P C (a) for action a defines a probability P c (a|c i ) for each object/pose pair c i = o i , l j(i) in a set C(a) of object/pose pairs.A P c (a|c i ) represents the probability that a robot configuration does not collide with object o i in significant pose l j .Set C(a) is initially empty, and it is filled with object/pose pairs that negatively affect action a as they cause collisions.Similarly to the environmental probability, the object conditional probability is estimated as the fraction of configurations extracted towards the action sub-goal that did not collide with the object o i when it was at location l j(i) : where N c and D c are two positive integers.Whenever the RRT fails to sample a robot configuration due to a collision, all colliding objects and their poses are collected in a set of pairs Conversely, if the RRT succeeds in sampling a configuration towards q end (a), probability P c (a|c i ) is increased by incrementing N c (a|c i ) and D c (a|c i ) for all c i ∈ C(a).Similarly to P e (a), after a successful sampling, if P c (a|c i ) < N min D min , then we reset N c (a|c i ) = N min and D c (a|c i ) = D min .
3) Solved Symbolic States: a set of symbolic states P S (a) where the action sub-goal q end (a) was reached for action a.That is, when the RRT reaches q end (a) in a symbolic state s, then s is added to P S (a).
When the probabilistic model is queried to obtain probability P(a|s), first it checks if s is among solved symbolic states P S (a), as in this case the action sub-goal has already been reached and probability is 1.Else, probability P(a|s) is computed according to (1), ( 2) and (3).

C. Task Planner
The task planner searches for a task plan, i.e. a sequence of symbolic actions α best = {a 1 , a 2 , . . ., a |α best | } that satisfies the task goal G while maximizing P(α), i.e., the joint probability of performing the entire sequence of actions α: where s i−1 is the symbolic state obtained after applying all actions of the task plan up to a i−1 .In this work, a tree search approach was used based on A , which prioritizes the expansion of the node with the highest value of the heuristic: where α is the partial plan up to that node, and H(s |α | ) estimates the probability to reach the task goal given the final symbolic state s |α | of α .For each condition in G (e.g., an object must be in a given location) which is violated by state s |α | , at least one additional action must be taken by the robot.Hence, given a constant P H defining the a-priori probability of an action, heuristic H(s |α | ) is: where U (s |α | ) is the number of unsatisfied task goal conditions.A bounded approach was used to speed up the search after the first execution of the task planner, given the previous solution α prev .Nodes where P(α ) < P(α prev ) are pruned, as they cannot lead to an improvement over α prev .
Algorithm 2: Probability Update Procedure UpdateProb.Input: P, E, T , a, q new , q goal , C new , s Output: P 1: if q new = q goal and C new = ∅ then Goal unreachable 2: if E ∈ C new then Decrease(P e (a)) 6: if C new = ∅ and Success 8: ∀q|(q, s) ∈ T , |q new − q goal | < |q − q goal | then 9: Increase(P e (a)) 10:

D. Integrated Task and Motion Planner
PROTAMP-RRT searches for a solution with a single execution of the unified RRT planner.The unified RRT motion planner iteratively extends T until a path to a symbolic state that satisfies the task goal G is found.The extension is guided by the most promising sequence of symbolic actions α best .
The planning algorithm (Algorithm 1) has a main loop (lines 2-22) that consists of two parts, task planning and motion planning.The task planning part (lines 4-11) invokes the task planner to obtain α best , while the motion planning part (lines [12][13][14][15][16][17][18][19] extends the RRT T guided by the symbolic plan. Task planning (line 5) is executed only if there is no symbolic plan α best , or if its probability P(α best ) (4) falls below a threshold P best P mult , where P mult 1 is a constant and P best (line 11) is the probability of the plan when it was first generated.Once α best = {a 1 , . . ., a |α best | } is available, the symbolic states traversed by α best , {s 0 , . . ., s |α best | }, are known.The planner uses ReachableActions (line 6) to descend the motion tree T and to find the index r of the first action a r whose sub-goal has not been reached yet by the RRT.Symbolic state s r−1 is obtained by applying all actions of α best up to a r−1 .If the task planner cannot find a plan with probability greater than a threshold (line 7), a new significant object pose is sampled into L from each region in R (line 8), thus adding new actions until a new plan becomes feasible.As new actions are added with an environmental probability P e < 1, old actions with high probability may still be preferred by the task planner.Sampling new object poses never influences environmental probability, hence the test at line 7 uses only object conditional probability of the plan, defined as: where P obj (a|s) is defined in (1).The dynamic threshold (line 7) is a constant P low raised to the number of non-reached sub-goals of α best .At each iteration, the motion planner first sets the current sub-goal q goal to the sub-goal q end (a r ) of action a r (line 12).Then, at line 13 a new configuration q new is sampled by SampleState, randomly or towards the action sub-goal with probability P goal .When SampleState is called and q goal is different from the previous iteration of the main loop, SampleState always extracts q goal , to check if the action is trivially feasible or infeasible.The collision check is performed at line 14 and a set of pairs C new = { o i , l j(i) } of colliding objects and their locations are found.This set is used to update the probabilistic model P (UpdateProb, line 15), which in turn changes the symbolic plan probability P(α best ) for the next iteration.If no collisions are found (line 16) T is extended with q new (line 17).
If T has been extended to q goal and r was the index of the last action of α best , the algorithm terminates because the task goal G is reached (lines [18][19].Otherwise, if T has been extended to q goal , but other actions must be performed, the symbolic state s r−1 is added to the set of solved symbolic states P S (a r ), and P e (a r ) is set to 1 in the probabilistic model (SetSolved at line 21).Also, as q goal has been reached, the next action of α best is reachable, and therefore r is incremented (line 22).Then, the algorithm continues with the next iteration of the main loop.
In UpdateProb (Algorithm 2) if the sampled state q new is the action sub-goal and it is in collision (line 1), then the action is set as infeasible whenever any colliding objects/pose pairs are present in C new (line 2).If sampled state q new is not the action sub-goal and it is in collision (lines 4-6), then the obstacles are hindering the action and probabilities P e (a) and P c (a|c i ) are decreased as explained in Sections III-B1 and III-B2.Probabilities are increased (lines 7-10) if q new is not in collision, and if it is the nearest configuration to q goal among the configurations in T with symbolic state s.
The UpdateProb procedure provides a negative feedback if the RRT fails to expand towards the current action sub-goal.In particular, in Algorithm 2 (line 8) the probability is increased only if the new configuration is closer to the action sub-goal than each previous configuration.The probability of finding configurations which are closer to the action sub-goal decreases exponentially with the number of configurations in T .While expansion towards the same action sub-goal is attempted and fails, the probability P(α best ) is reduced and it will eventually be lower than P mult multiplied by its initial value.When this occurs, the task planner is called again, to find a (potentially) different plan with higher probability.

A. Experimental Setup
The PROTAMP-RRT approach was evaluated in simulated manipulation tasks in which a robot moves boxes.The robot is a fixed Universal Robot UR10 manipulator (with a reach of 1300 mm).A parallelepiped of size 0.1 × 0.1 × 0.11 m was used to simulate an OnRobot VGC10 vacuum gripper.We assumed that gripper is able to perform grasping by contact with any flat side of the objects, disregarding dynamic effects.The simulated experimental environment was implemented on top of ROS (Robot Operating System) and visualized in RViz.The software was implemented in C++, using the OMPL planning library [27].The MoveIt planning framework and the Flexible Collision Library [28] were used to instantiate the geometry and check for collisions.The IKFast OpenRAVE

B. Manipulation Tasks
We tested PROTAMP-RRT on simulated pick-and-place tasks.Two action templates were considered, i.e. pick and place (Fig. 4).For each action, the probabilistic model P defines the environmental probability, the object conditional probabilities, and the set of solved symbolic states.A pick action represents the action of approaching an object while the gripper is empty, and then attaching the object to the gripper.This action is parameterized by the target object o t , the significant object pose l g of o t , and the robot configuration q g to grasp the object in l g .For each object o t and pose l g , multiple pick actions are generated with different configurations q g , as the robot may grasp the object from multiple sides and each side may be reached with a different inverse kinematic solution.The preconditions imply that no object is attached to the gripper (o attached = ∅) and o t is in l g .As result of the action, the final robot configuration q end is q g and o t is attached to the end effector.A place action represents the action of moving a grasped object to a new location, and then releasing the object.This action is parameterized by the target object o t , the initial significant object pose l p , the final significant object pose l g , the initial robot configuration q p , and the final robot configuration, q g .As for the pick actions, for each o t , l p and l g , multiple place actions are generated with different configurations q p and q g , taking into account multiple inverse kinematic solutions.Preconditions are that the initial robot configuration is q p , o t is grasped and it is in significant object pose l p , while l g pose must not be occupied by other objects.In the postconditions, the final robot configuration q end is q g , o t is in l g and no object is grasped.
In total, 6 manipulation tasks were simulated (Fig. 5, also shown in the attached video).In the Transfer 1, Transfer 2 and Transfer 3 tasks, red objects O 1 , O 2 and O 3 must be relocated into significant object poses sampled from the Place table planar region.In Transfer 1, no obstacles are present between initial and goal locations, while in Transfer 2 and Transfer 3 walls have been added around the objects.The robot is allowed to grasp objects  only from above and significant object poses can be sampled in the Place table region only.In the Extraction task (Fig. 6), red objects O 1 and O 2 , must be relocated into the Place table region.Initially, grasping is infeasible due to the obstructing objects (blue boxes), O 2 and O 3 , which have to be relocated.The robot can grasp each movable object from above or from the front side.Significant object poses for the red objects can be sampled in the Place table region, while for the blue objects they can be sampled in the Pick table region.In the Regrasping task the red box O must be relocated on top of the Shelf region.This task requires regrasping as initially the object can be picked only from above, and it can be placed in the deposit region Shelf only if it is grasped from the side, due to obstacles.Movable objects can be grasped from above or from the front side.Significant object poses can be sampled in the Shelf region, and in the Table region where re-grasping may occur.
Finally, in the Culprit task red box O 1 must be extracted from the cupboard and relocated in the Table region.To do this, panel O 2 must be removed, as the opening is slightly too small otherwise.This task is challenging as the planner can be stuck in a strong local maximum: the robot is able to pick and Fig. 7. Avg.planning time (µ time ), avg.number of robot configurations (µ q ), avg.number (µ s ) of symbolic states (for Planet, TM-RRT, PROTAMP-RRT).
move O 1 within the cupboard, and the planner can therefore successfully sample many robot configurations without moving O 2 , however, none of these samples lead to a solution of the task.Significant object poses for the red object can be sampled in the Table region, and for O 2 they can be sampled in the Floor region.An example is shown in Fig. 1.

C. Experiments in Finite Symbolic Space
Our method was compared against two other unified RRTbased approaches, TM-RRT [3] and Planet [4], on the six manipulation tasks: Transfer 1 (T 1), Transfer 2 (T 2), Transfer 3 (T 3), Extraction (Extr .),Regrasping (Regr .)and Culprit (Culp.).As TM-RRT can operate only on a finite symbolic space, we pre-defined a minimal finite set of significant object poses sufficient to solve the planning problem.Hence, sampling of new significant object poses in PROTAMP-RRT was disabled to ensure fair comparison.For the same reason, the goal of each task was not defined in terms of regions, but as a specific goal pose for each object.Planet was also modified so that actions did not sample their sub-goal randomly, but selected it among the pre-sampled poses.For each and each method, planning 50 times.The average planning time, the average number of robot configurations the average number of symbolic states in are in Fig 7 .Planet and TM-RRT sampled more unified states in X , as evidenced by the higher values of μ q and μ s .Instead, thanks to the probabilistic model P, PROTAMP-RRT focused on the symbolic actions with highest probability, and therefore resulted in a lower planning time μ time .The only exception is Transfer 1 task, where no obstacles were present and motion planning was trivial.Although in Transfer 1 PROTAMP-RRT still explored fewer symbolic states (lower value of μ s ), both Planet and TM-RRT resulted in a lower planning time μ time .The main reason is that these algorithms use heuristics to guide the RRT instead of a long-horizon task planner.
The success rate as a function of planning time is shown in Fig. 8. Planet and TM-RRT success rate is generally lower than PROTAMP-RRT.At the maximum tested time limit of 600 seconds, Planet reached 100% success rate only in Transfer 1 and Extraction, while TM-RRT only in Transfer 1, Transfer 2, Transfer 3 and Extraction.In particular, Planet and TM-RRT struggled the most in the Regrasping and Culprit tasks.Instead, our method reached 100% success rate in all tasks except the Fig. 8. Planet, TM-RRT and PROTAMP-RRT average success rate with respect to planning limit, a finite symbolic space.Fig. 9. PROTAMP-RRT in unlimited symbolic space.Left to right: Avg.planning time (µ time ), avg.number of robot configurations (µ q ), avg.number of symbolic states (µ s ), avg.number of sampled significant poses (µ l ).
Culprit task, where success rate was 98%, compared to the 22% of TM-RRT and 10% of Planet.The proposed probabilistic model is very effective in the solution of the Culprit task, as it is able to discover that, even if the red box can be picked and moved within the cupboard, it can not be extracted until the blue panel is moved.

D. Experiments in Unlimited Symbolic Space
In this section, the full PROTAMP-RRT algorithm is evaluated in an unlimited symbolic space, without a pre-defined set of significant object poses, so that it sampled new significant object poses to reach the goal.Results are shown in Fig. 9. Compared to PROTAMP-RRT in Fig. 7, where the significant object poses were pre-defined, the higher complexity of the task resulted in an increased planning time μ time , number of robot configurations μ q and number of symbolic states μ s , by about an order of magnitude.The only exception is the Regrasping task, where the results are similar.
Fig. 9 also reports the average number of sampled significant object poses (μ l ).As new significant poses are sampled only when task plan probability is very low, PROTAMP-RRT limited the expansion of the symbolic space.The task where more significant object poses were sampled is the Extraction task, where four movable objects were present.However, not all resulting symbolic states were explored and added to T as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.evidenced by μ s , which is comparable to μ s for the Transfer tasks.This is likely an effect of the probabilistic model, that determined which symbolic states should be avoided.Fig. 10 shows PROTAMP-RRT success rate as a function of planning time limit.Even if results are worse than in Fig. 8, PROTAMP-RRT successfully solved most of the trials after about 400 seconds.Results confirm that the Extraction task seems to be the most challenging.
Further experiments and data are available at https://rimlab.ce.unipr.it/Saccuti.html, in the Material section.

V. CONCLUSION
An integrated TAMP approach based on a unified Rapidlyexploring Random Tree was proposed, that operates on both the geometric space and the symbolic space.The method introduces a probabilistic model that estimates the feasibility of symbolic actions by evaluating the probability that new collision-free configurations will be sampled.The PROTAMP-RRT method was evaluated in several manipulation tasks showing better performance than TM-RRT and Planet in terms of planning time and success rate.Moreover, the ability of PROTAMP-RRT to expand the symbolic space was evaluated indicating that it is able to solve planning problems by sampling a few new significant object poses.Future work will involve the investigation of a more advanced symbolic planner to evaluate logical statements other than object/pose pairs.

Fig. 1 .
Fig. 1.Complex manipulation task where the red box must be extracted from a cupboard.Before the extraction, the blue panel must be relocated.

Fig. 4 .
Fig. 4. Definition of Pick and Place symbolic actions.