Local and Global Search-Based Planning for Object Rearrangement in Clutter

We propose two algorithms based on local and global searches for a Task and Motion Planning (TAMP) problem, which considers a robotic manipulator to rearrange obstacles and grasp a target in clutter. In the problem, no collision-free path for a robotic manipulator is available unless some obstacles blocking a target are relocated. The two algorithms determine the sequence of obstacles to be removed for grasping a target without collisions. The local search algorithm determines the sequence quickly in an online manner but could be suboptimal in the number of removed obstacles. The global search algorithm based on tree search needs upfront computation but runs in polynomial time. In numerical simulation settings, we consider objects in various shapes, which could make reachable directions bounded. From the simulations the planning time of local search algorithm is faster than that of global search, whereas the global search algorithm removes the less number of obstacles than the local search. In addition, we show that the global search algorithm with a heuristic cost is faster than without the cost but the minimum number of removed obstacles is still obtained from the global search algorithm without the heuristic cost. Practical experiments show the applicability of our algorithms in real environments


I. INTRODUCTION
In a workshop or home environment, a robot could perform manipulation tasks in cluttered environments, where movable objects may occlude each other as seen in Fig. 1 (e.g., bin picking in warehouses). When a target object is occluded by other objects (so called obstacles), it is necessary to rearrange the obstacles and grasp the target without collisions.
In such cluttered environments as Fig. 1 overhand grasping may not be available geometrically so that the approaching motions from the side or front are essential. This approaching may request several sub-tasks to remove some obstacles The associate editor coordinating the review of this manuscript and approving it for publication was Zheng Chen . blocking a target object. When a robot wants to move an object into a certain spot, the sequential actions of pick-andplace are necessary. The more objects to move, the more pickand-place actions are requested. In other words, a smaller number of objects to move could make the number of pickand-place actions decrease. Eventually, reducing the number of robot actions for removing the obstacles affects the efficiency of rearrangement and target grasping. Due to this, the present work focuses on minimizing the number of obstacles to be removed. The rearrangement problem determines the sequence of obstacles to be removed and is known to be computationally difficult as mentioned in [1]. In addition, the problem could become much harder, if the reachable direction to objects is bounded by the shapes of objects, which is dealt with in this work as well. VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ FIGURE 1. An example of cluttered environment. Certain objects need to be removed from the clutter to grasp a target object.
Some previous works propose rearrangement methods based on motion or path planning. Dogar and Srinivasa [2] present a planning framework that generates a straight line path from an end-effector to a target object. Removing the obstacles on the path is planned as sub-tasks and other objects may also be removed to get spaces to place such obstacles removed from the path. Similarly, in [3], the motion of a manipulator is generated first without considering the obstacles and then all the obstacles in collision with the robot are removed. Murali et al. [4] suggest the rearrangement method considering the accessibility between a gripper and a target object. The method is not intended to minimize the rearrangement task to grasp a target object. However, considering the 6-dof of the gripper poses, it finds an obstacle that prevents the manipulator from grasping a target object efficiently in real-world. Sundermeyer et al. [5] also propose a learning based method called Contact-GraspNet to find the grasping points of an object in clutter by considering the object's pose and collisions with other objects. The method calculates the 6-DOF of a gripper with contact points between the gripper and an object, even if an unlearned object is given. Danielczuk et al. [6] propose a method to place an object without collisions in clutter. The method converts objects and an environment scene into point clouds and finds the 6-DOF pose of an object for a gripper to place it safely in the clouds of an environment. Moll et al. [7] suggest the planner that plans a path to a target by pushing all the obstacles on the path aside a bit. The planner generates a path by a random tree expansion whose strategy is a concern about improving the coverage of the tree rather than determining the number of obstacles to be pushed. Papallas and Dogar [8] also suggest the method to grasp a target by pushing objects. When a human commands the start and end points of the end-effector, a robot generates a motion that pushes objects around a target. Ren et al. [9] propose a rearrangement planning algorithm to make a gap between dense blocks and help a fingergripper grasp the target with ease. The algorithm applies a heuristic function to RRT (Rapidly exploring Random Tree) to reduce planning time. Eitel et al. [10] propose a method of singulating each object so that a finger-gripper can grasp the objects by considering the grasping points of the objects in clutter. The method generates a push plan by learning images using CNN. The images represent the expected state when objects are pushed. Pinto et al. [11] and Yuan et al. [12] propose methods that do not directly optimize energy or time for accomplishing a manipulation task but mainly consider the validity of their plans. Therefore, it is observed that some obstacles may be removed unnecessarily. Most of these approaches focus on finding the obstacles to be relocated, not reducing the number of obstacles seriously. In addition, they do not consider such a constraint as the reachable directions of objects that could affect both of task and motion planning in real world. Thus, our goal is to plan as small numbers of sub-tasks as possible for grasping a target object, considering the reachable directions of objects.
Another works aim to optimize the performance of the object rearrangement to be a goal configuration. Han et al. [13] solve an optimization problem for object rearrangement by using a tree search algorithm, which aims at minimizing the number of pick-and-place actions and the travel distance of the end-effector. Krontiris and Bekris [14] and Han et al. [15] also propose the methods that minimize the number of object movements for a rearrangement, where objects are stored in stack-like containers. These methods are similar to the Tower of Hanoi algorithm. Eljuri et al. [16] and Wang et al. [17] also propose rearrangement planning algorithms to relocate objects for a given goal state by checking the motion feasibility of the state. Although their algorithms show high efficiency and solution quality, the problems are not involved with complex motion planning and only considers placing objects on the top of the container. On the other hand, scalability of a planning algorithm is important as cluttered environments have a nontrivial number of objects. Vega-Brown and Roy [18] produce high quality solutions based on a graph search for rearrangement problems, which are to move objects to a designated space by a mobile robot. The environments in the problems are not cluttered due to the small number of objects compared to the size of map, so increasing scalability of them remains unsolved. These rearrangement problems are, however, slightly different from the present one, since the present one is to find what obstacles to be relocated in what order for grasping a target.
We propose two Task and Motion Planning (TAMP) algorithms that decide the obstacles to be relocated. The first one is based on locally searching the obstacles that are most accessible by the end-effector of a robot. The accessibility of obstacles is evaluated by employing Vector Field Histogram+ (VFH+) [19], which finds obstacle-free directions for approaching an obstacle. This algorithm runs fast (less than 0.05 sec to generate a plan for ten objects) and is complete. It could, however, make a suboptimal solution in aspect of the number of obstacles to be removed. We develop another algorithm to find an optimal solution using Breadth-First Search (BFS) of a tree representing possible manipulation actions and resultant states. The optimal solution removes the minimum number of obstacles so that the total execution time can be reduced and the computation time significantly by pruning and ordering as well (e.g., less than 0.6 sec with ten objects). The two algorithms deal with the reachable directions of objects, which are bounded by the shapes of objects and end-effector. We run some simulations for the two algorithms and compare the results. We attach a video clip describing the algorithms and the simulation results as well.

II. PROBLEM DESCRIPTION
In this section, we describe the terminologies, the definition of a direction angle, and the assumptions for the present problem in detail. First, the notations and terminologies used in the paper are given as seen in TABLE 1.

A. CONCEPT AND DEFINITIONS OF DIRECTION ANGLES
We define tree types of direction angles of an object for grasping as seen in Fig. 2. Accessible directions are the angle range that is not blocked by obstacles so that the end-effector can approach an object in the directions without collisions. Reachable directions denote the angle range which is determined according to the shape and size of an object and the workspace of an end-effector. Reachable directions can vary depending on a gripper for the same object. Graspable directions are then obtained by the intersection of accessible and reachable directions, in which the end-effector can approach the object and grasp it without collisions.

B. ASSUMPTIONS
We consider the problem of grasping a target object, which requires a robot to rearrange obstacles and avoid collisions. The problem deals with an environment with N objects including a target object t. The basic assumptions are give as follows: (i) The robot knows the configurations and geometries (i.e., pose, shape, size) of all objects including a target. (ii) The end-effector has no collision-free path to a target without relocating some obstacles. 1 (iii) The removed obstacles are placed outside the environment, which is 1 When a collision-free path to a target exists, our method is able to give that path without requesting rearrangement. predetermined. (iv) Overhand grasping is not allowed, since the present work focuses on manipulating of objects in such environment as seen in Fig. 1. (v) There is at least one object that has enough space for manipulation such that the finger of a gripper has a graspable direction for the object. 2 (vi) Objects can be grasped in different reachable directions owing to their various shapes. The reachable directions of objects are known as a priori.
Among these assumptions, the last may bring more challenges in establishing a task plan for rearrangement, since grasping motions could be constrained significantly by the limited reachable directions of objects. For example, a cuboid with different side lengths as in Fig. 2 would be grasped only in one side (the shorter), if the size of end-effector is not wide enough. Therefore, task and motion planning needs to consider the reachable directions of objects carefully.

C. PROBLEM STATEMENT
In this work, our goal is to remove as few obstacles as possible and obtain a collision-free path for an end-effector to grasp a target object in clutter. Removing an obstacle assigns an additional grasping task and necessarily yields collision-checking between objects and a robot body, which causes significant computational efforts to find valid paths. Therefore, reducing the number of obstacles to be removed could generate less grasping actions and save time for extra collision checks in motion planning, which finally decreases the total execution time for rearrangement. In formulation, we aim to find a sequence of obstacles to be removed O s = where O is the set of all N objects including a target t. The objective of problem is to minimize K and find the corresponding O s , where K = |O s |.

III. METHODS FOR OBSTACLE REARRANGEMENT CONSIDERING GRASPABLE DIRECTIONS
In this section, we describe the two algorithms of LocalSearch and GlobalSearch. Both of them employ Vector Field Histogram+ (VFH+) [19], which has been widely used for mobile robots to find obstacle-free directions for autonomous navigation. We modify VFH+ and use it as a subroutine to find a collision-free direction to an object. We first give a brief description on the modified version of VFH+ and then introduce the algorithms.

A. THE MODIFIED VFH+
The modified VFH+ computes a histogram to represent the graspable directions of an object in two steps. First, it finds obstacle-free angle ranges as seen in Fig. 2 (i.e., accessible directions in blue). Next, the modified VFH+ computes the graspable directions (in green as seen in Fig. 2) by computing the intersection between the accessible and the reachable directions. A robot can take one of the resultant graspable directions to grasp an object. We briefly summarize the computation of the accessible directions using VFH+ (The technical details are referred to [20]).
The key idea is to consider the environment in the perspective of a target object, which is the center circle in green as seen in Fig. 3a. In the coordinate system, the direction from the target object to the robot base is determined as the xaxis and 0 • . From the configuration of objects, the modified VFH+ computes a polar histogram according to the positions and sizes of objects around the target. Considering the sizes of gripper and objects, we have the outline circle of each object enlarged for collision-checking as seen in Fig. 3a. A collision occurs, when the gripper is in the enlarged circles in red. The histogram range is determined by considering the sizes of enlarged circles. The overall histogram considering all objects is computed by the sum of the polar histogram of each object as seen in Fig. 3b. In Fig. 3b, there are three angle ranges in the polar histogram (i.e., -134 •~( -42 • ), -9 •~8 0 • , 24 •~1 72 • ), where non-zero magnitudes of histogram are computed. A single obstacle occludes the target in some ranges so that the magnitude of histogram in the ranges is one. Multiple obstacles could block the target so that the magnitude of histogram is larger than one (e.g. it is two in the range of 24 •~8 0 • in the figure). The angle range with zero magnitude means that there is no object blocking the target in the range. Due to geometrical limitations, the workspace of robot arm is bounded so that the sector of histogram is partially and practically available. For the example in Fig. 3c, the sector of angle ranges from -45 •~4 5 • is the workspace such that the robot cannot reach outside that sector. The valid histogram H of the sector is computed as shown in Fig. 3d. From H , a set of accessible directions can be obtained. If the target object is reachable from any direction (like a cylindrical object), all the accessible directions become graspable directions. If the object has limited reachable directions (which is known as a priori), the graspable directions could change according to the pose of the object.
GraspabilityCheck determines the minimum magnitude of the histogram (H G ) out of the histogram (H ) and its corresponding sector for a given object. The histogram (H ) for a given object is obtained by the modified VFH+ as mentioned before. The inputs of GraspabilityCheck are a configuration of objects O, a target object t, a robot configuration M (i.e., robot kinematics and position), and a parameter d max . The parameter d max is used to determine the area for obstacle detection, which is described in Sec. III-B in detail. Within the reachable directions, the sector having the minimum magnitude of the histogram is obtained as H G , which is the output of GraspabilityCheck. GraspabilityCheck is complete, because the histogram must have at least one minimum magnitude. There are two cases depending on the minimum magnitude of histogram: (i) If the magnitude is zero, there is an obstacle-free direction to the target. Thus, the algorithm can find graspable directions to grasp a target. (ii) If the magnitude is nonzero, there exists at least one obstacle blocking a target. For the example in Fig. 3d, the minimum magnitude of the histogram, H G , is zero. This means that no obstacle blocks the target in the sector so that the target can have a graspable direction in that sector. If the target has a limitation on its reachability in the shape like a box, the sector could shrink more. A time complexity of GraspabilityCheck is linear. GraspabilityCheck calls the modified VFH+ once to compute a histogram H . The modified VFH+ runs on N objects at most in the order of O(N ) in [20]. Then Gras-pabilityCheck runs to find the graspable directions with the minimum magnitude of the histogram. The time complexity depends on the resolution of the histogram sector. If we divide

B. LOCAL SEARCH ALGORITHM
LocalSearch algorithm in Alg. 1 aims to find the obstacles to remove one by one through iteration. If multiple obstacles block a target object, the algorithm recursively runs until it finds graspable obstacles by chaining obstacles from the one near to the target to the one that is graspable. Once the robot removes a graspable obstacle that is not the target, the algorithm is called to find the next obstacle for removing with the updated object configuration (O), which the current obstacle is excluded from. The algorithm is called repeatedly until the target t is removed from O.
The quantity d max determines what obstacles are included for computing the histogram. Initially, d max is given as the Euclidean distance from the target to the farthest obstacle from it. After each step (after removing an obstacle), d max is updated as the Euclidean distance between the target and the obstacle that is most recently removed (line 6). By this update, the obstacles in other directions but the one the robot is currently ''digging'' can be excluded. Figs. 4 shows how to update d max . In the beginning, the Euclidean distance from the target to its farthest object in O is computed (d max,0 in Fig. 4a) so that all the objects in O are initially considered. Once Obstacle A is removed, the distance from the target to Obstacle A (i.e., d max,1 ) is used by GraspabilityCheck to find the next obstacle to remove (Fig. 4b). After removing Obstacle B, d max,2 is computed in the same way. If d max is not updated as shown above, the robot could have removed more obstacles. For example, after removing Obstacle A in Fig. 4b, Obstacle C could be the next, if the algorithm keeps using d max,0 . This is due to the fact that Obstacle C is still included for computing the histogram.
configuration O that is set of all objects remaining in the workspace including t, robot configuration M , distance from a target to a farthest object d max Output: The set of an obstacle to be removed O s , updated d max The minimum magnitude of histogram H G is computed by GraspabilityCheck in line 2. The magnitude of H G represents the minimum number of obstacles to be removed in the direction angles of H G . Therefore, the robot can remove the minimum number of obstacles. If there are multiple obstacles in the sector of H G , the algorithm chains the obstacles from the closest to the target to the farthest. The chain of obstacles begins from the target, since the histogram indicates which obstacle blocks the target. Thus, the closest obstacle o r is chosen for removal in line 3. If the chosen obstacle is not the current target t c , it means other obstacles still remains to be chained. Thus, the algorithm recursively finds the next obstacle in the chain by considering o r as a temporary target (it denotes the current target t c lines [13][14]. If the obstacle o r is the current target t c , it means the magnitude of H G is zero. It is then evaluated whether a motion planning to grasp o r is feasible. If the motion planning is feasible, o r is graspable and the set of O s with a single element of o r is returned (lines 5-8). When the motion planning is infeasible, a new o r that is the next closest to the target in the sector of H G is selected (line 10). This recursion occurs until the algorithm finds a graspable obstacle, which is at the end of the chain. Once the graspable obstacle is removed, Alg. 1 is called to find the next obstacle until the original target (t) becomes graspable. Due to this, the algorithm solves a new removal problem after removing an obstacle, which could be favorable to dealing with such dynamic situations as the poses of objects change or new objects appear.
Three examples of the algorithm are shown in Figs. 5. In Fig. 5a, a sector range of 4 has the minimum magnitude VOLUME 10, 2022 for the histogram H G . Within the sectors, the obstacle o 3 at the bottom is chosen to remove. In the example of Fig. 5b, the sector 3 has the zero magnitude of H G , which means that the target has graspable directions in that sector. If only the target has limited reachable directions in the same configuration, the graspable directions could shrink as seen in Fig. 5c (graspable directions in green on the left and the corresponding histogram on the right).
It is noticed that Alg. 1 is complete and runs in polynomial time as shown in the following theorems.
Theorem 1: Alg. 1 is complete and a robot manipulator eventually grasps a target.
Proof: By Sec. III-A, GraspabilityCheck is complete so that it always returns an object to be grasped (either the target or an obstacle). Therefore, Alg. 1 can always find a graspable object returned by GraspabilityCheck. In each recursive call of Alg. 1, one object is removed. After the recursion is repeated for all N − 1 objects in the worst case (i.e., all obstacles are blocking the target so removed), the last remaining object, which is the target, is grasped. Thus, Alg. 1 guarantees that the target is grasped.

C. GLOBAL SEARCH ALGORITHM
Alg. 1 runs in polynomial time and could take relatively low time cost to find obstacles to remove. However, the solution may be suboptimal in the number of removed obstacles as the algorithm chains obstacles locally without considering global optimum. Thus, we develop a global search algorithm using a tree search method. We define a tree to search a sequence of obstacles to be removed. As shown in Fig. 6, a node in the tree represents a state of objects (i.e., an object configuration). A goal node represents a configuration, where the target has graspable directions without collision. If object o is removed from the current configuration O, a child node is generated to represent the configuration where o does not exist anymore, i.e., O \ o. Thus, the depth of the tree denotes the number of removed obstacles. A tree search algorithm traverses over the tree structure until the goal node is found. In each node, multiple child nodes can be generated, since multiple objects need to be removed. A frontier stores the nodes that will be searched while expanding the tree. Among the nodes in the frontier, the next node is determined by the search strategy for expansion.
We should employ one of the uninformed search strategies, as we do not have any information regarding the goal state so that evaluating the expansion of the tree is not possible. We choose the Breadth-First Search (BFS) to find the optimal solution among many possible ones. BFS searches all the nodes in the same depth and moves to the next depth. If a goal node is found by BFS, the node is at the shallowest depth compared to all the other solutions [21]. In our problem, a deeper depth means removing more obstacles such that the goal at the shallowest depth indicates the minimum number of removed obstacles.
However, BFS has exponential time complexity so that it may run slow to find a goal node. To reduce the practical runtime, we use a pruning and an ordering method. We prune the tree, if a node to be generated has the same state with one of existing nodes. On the other hand, the order of nodes in the frontier could affect the search efficiency significantly. . The concept of a tree structure: Each node represents an object configuration. A child node is generated by removing an object from the configuration. The depth of tree denotes the number of obstacles to be removed. Since BFS explores all the nodes in the same depth, it could work faster if the nodes that are most likely the goal are searched first [22]. After every expansion of the tree, we order the nodes in the frontier (implemented by a First-In-First-Out queue) to remove the obstacle that likely clears the path to the target. The details of the pruning and the ordering method are described below with the pseudocode shown in Alg. 2.
Alg. 2 has a similar input and output with Alg. 1. Since Alg. 1 works in an online manner, it is called repeatedly to remove objects (obstacles) one by one and grasp a target. Thus, Alg. 1 needs to return d max for the next call in addition to the set O s indicating which object to remove. Since Alg. 2 finds the full sequence of the objects to be removed, d max is used and updated inside the algorithm. In addition, an initial node v 0 is an additional input argument to Alg. 2.
In the beginning, the frontiers Q and U and the set of obstacles to be removed O s are initialized. Then the root node v 0 and the initial d max are inserted into the frontiers of Q and U . Before the target is grasped, the following loop repeats (lines 4-34). First, the node v and d max to be expanded are dequeued (line 5-6). For each object in O in the state v, GraspabilityCheck checks if each of the objects has graspable directions (line 8). If an object has such graspable directions (i.e., H G has zero magnitude), the object can be removed. Thus, a new node v + is generated and saved to the frontier (lines 10-11). The quantity d + max is obtained in the same way as done in Alg. 1 for the next computation of GraspabilityCheck. Pruning is done in line 32. When a tree is generated, multiple nodes representing an identical state can be generated. For example, if two graspable objects are removed in different orders, they result in the same state. Duplicating of nodes may cause unnecessary searching efforts and make the algorithm slow. To check the duplication in node generation, we compare all the nodes in the same depth. Since the nodes in other depths have the different numbers of removing objects, those nodes never share the same state.
In line 33, the nodes in the frontier are ordered. As described before, the frontier stores the nodes to search, which are generated by removing an obstacle. Thus, the nodes in the frontier represent the state where each of graspable objects is removed. In Fig. 7, the top row of the right figure shows the frontier without ordering for the configuration of objects on the left. The nodes are searched in the order of B, C, and D. To order the nodes, we consider the reachable directions of the target. Since the end-effector approaches an object in its reachable directions, it is important to consider the objects within the directions first. In each expansion of the tree, we find the reachable directions of the target. In the left figure of Fig. 7, the reachable directions of target are represented in the fan shape. The first node into the frontier represents the configuration, where the corresponding obstacle occludes the target most and is selected for removal (determined by the histogram). The next nodes are determined according to the same manner (i.e., dependent on how much the corresponding obstacle occludes the target). The nodes outside the reachable directions are then inserted into the frontier by the lexicographical order. The bottom row of the right figure in Fig. 7 shows the ordered frontier. The nodes are searched in the order of D, C, and B after ordering.
Once a child node is generated, the algorithm checks if the node is a goal state. In other words, if the corresponding object to be removed is not the target, the same process runs to find another object to remove. If the object to be removed is the target, the algorithm saves the sequence of all the objects(i.e. obstacles) to be removed so far. This is done by chaining the parents nodes back to the root (lines [16][17][18][19][20][21]. The objects in the chain are saved to O s in the reverse order as seen in line 22, (i.e., the order of obstacles from the robot to the target). Then, the algorithm evaluates whether the motion planning for the set O s is feasible. If it is feasible, the algorithm terminates and returns the set O s (lines 23-24). If not feasible, O s becomes empty, and the last elements of frontier Q and U are deleted (lines [25][26][27]. And replanning proceeds until a new goal node is found. Fig. 8 shows an execution example of Alg. 2. In the example, a goal node is found at the depth level 3 so that the two objects as obstacles are removed and the target is grasped in the node order of B-D-G. It is noted that Alg. 2 is complete and runs in exponential time as shown in the following a theorem. VOLUME 10, 2022 FIGURE 8. An execution example of Alg. 2: In the beginning (Node A), the obstacles to be removed are found by GraspabilityCheck. Since o 2 and o 3 are removable and o 1 and t are not graspable in Node A, two nodes B and C are generated from Node A. Their order is determined by the ordering method described above. Since o 2 blocks more widely the reachable directions to the target than o 3 , it is searched earlier. After the two nodes are generated from the root node, the search goes to the next depth because of no more obstacles to be removed in the current depth. It is noticed that Node E is pruned in the next depth, since it is identical to Node D. Finally, the target is grasped in the bottom depth. If motion planning for nodes B-D-G is feasible, the sequence is returned.
Theorem 3: Alg. 2 is complete so that it eventually gives a plan to grasp a target object.
Proof: By Sec. III-A, GraspabilityCheck is complete so it always returns an object to be grasped (either a target or an obstacle). Alg. 2 always expands the search tree by removing the object indicated by GraspabilityCheck. Since we use BFS that is shown to be complete, the solution is always found. Thus, Alg. 2 guarantees a sequence of objects(obstacles) to be removed including a target.

D. HEURISTIC SEARCH
The Breadth-First Search (BFS) in Alg. 2 determines a node v in the first-in-first-out (FIFO) order from Q in dequeue as in [21] (line 5-6). Instead we may apply the heuristic search proposed in [23] to dequeue and select a node by employing a heuristic cost for efficiency as in [22]. The cost for each node is defined as the sum of two H t G values of the current node and its parent node. H t G is H G value of the original target t, which is computed from GraspabilityCheck at each node. The heuristic search determines a node with a minimum cost among the nodes in the frontier since the H t G of the goal node is 0. The simulation results of Alg. 2 using BFS and the heuristic search are compared and investigated in Sec. IV-A4.

IV. EXPERIMENTS
We run simulations and practical experiments to validate our algorithm. In the simulations, we verify the characteristics of each algorithm and compare the proposed algorithms with another algorithm. In the practical experiments, we show that the proposed algorithms could be applicable to real environments.

A. SIMULATIONS
We run numerical simulation in a virtual environment using the dynamic simulator V-REP [24]. In the simulation, we use Algorithm 2 GlobalSearch Input: The original target t, robot configuration M , distance from the target t to its farthest object d max,0 , node v 0 representing the initial configuration that has the set of all objects in the workspace including t Output: O s that includes the obstacles to be removed and the target t for each o ∈ v do 8: if H G == 0 then 10: Generate node v + which represents v except o 11: Enqueue(Q, v + ) 12: Enqueue(U , d + max ) 14: if o == t then 15: x = v +

16
: Kinova JACO1, a 6-DOF manipulator anchored at the base location. The simulation instances of the problem are composed of cluttered and constrained environments as seen in Fig. 9. In the environments, three types of objects which are cylinder, cup and box are used. Considering a typical twofinger gripper as we used, we set the reachable directions of each object to 360 • for a cylinder (all surface), 60 • for a cup (round surface except the handle part) and 30 • for a box (two narrow sides). It is not assumed that the manipulator can approach objects from the top. Obstacles are supposed to be placed on the predefined spots of a shelf next to the robot. All the simulations were run on an Intel i7-7700 3.6 GHz with 16 GB memory.

1) RUNTIME OF GlobalSearch
First, we investigate the runtime of Alg. 2. The algorithm is implemented in python2.7 and Moveit! [25] and RRTconnect [26] of Open Motion Planning Library (OMPL) [27] are used for motion planning. The algorithm is performed with the 30 random object configurations for each case of the varying numbers of objects N = 7, 10, 13 and 16 as seen in TABLE 2. The tree search time increases in proportion to the number of objects N . The motion planning time is also proportional to the tree search time, since the number times of motion planning is in proportion to the depth of the tree. As the number of objects N increases, the probability of failure in motion planning increases. This failure causes more searching efforts to find a node where motion planning is valid such that the runtime of algorithm increases. The reasonable computation time may be obtained by pruning and ordering efficiently. Since this planning is done in advance, no other computation is required during robot's execution. In a practical application, the number of objects would not be prohibitively large, since an environment is bounded, dense, and confined. Therefore, Alg. 2 can be applicable to realworld instances.

2) COMPARING OF LocalSearch AND GlobalSearch
LocalSearch basically uses the same method as in [20] to determine which object to be relocated. The method in [20] was compared with other previous works in [2] and [28], providing the fewer number of removed obstacles. In this paper, we compare the algorithms of LocalSearch and GlobalSearch with a graph search method of [29]. The graph search method finds an obstacles sequence by creating paths between adjacent objects. We test the environments with the number of objects N = 10 and 16 and generate 30 random instances for each. The objects are placed randomly on a 0.6(m) × 0.4(m) planar area. The results are summarized in Figs. 10. The number of removed obstacles (K ) from GlobalSearch is always same as or smaller than one from LocalSearch and the graph search method. Since the graph search method is a global searcher but has fewer search cases than GlobalSearch, results of the method are less than LocalSearch but more than Glob-alSearch. The difference in K among the methods becomes larger as N grows, because LocalSearch and the graph search method have more chances to meet local minima if there are more objects. In Fig. 10a, the number above the graph is the P-value of the t-test. When P < 0.05, results of two methods are interpreted as statistically different. At N = 10, the results of LocalSearch and GlobalSearch are statistically different. At N = 16, there is a statistically significant difference between the results of the three methods. On the other hand, GlobalSearch has longer planning time and the algorithm needs to search again when motion planning is infeasible. However, the total time, which denotes the sum of planning time and robot execution time, for GlobalSearch is smaller than other methods as seen in Fig. 10b. The reduction comes from finding the global solutions, which have smaller K so that the execution of relocating the K objects takes less time. On average, GlobalSearch reduces 24% in the number of removed obstacles such that 23% in the total time is reduced, comparing with LocalSearch.

3) PERFORMANCE OF LocalSearch WITH MOTION PLANNING
LocalSearch is based on the method in [20] and is integrated by adding motion planning in this paper so that some of the searched results might fail in motion planning. The motion planning is employed to avoid collisions so that LocalSearch is the advanced version of the method in [20]. We compare the results of the two algorithms in this section. For balanced comparison, the method in [20] is modified to consider the reachable directions of the object. The success rate of task execution and the algorithm runtime are compared in TABLE 3. The simulation was conducted 30 times for each case of N = 7, 10, 13 and 16. If any other object but the removing obstacles falls down due to collision while a robot executes the task of grasping a target, the task fails. The algorithm runtime was measured only for the successful cases and is summarized in (b) of TABLE 3. When the task is successfully executed (i.e., no collision in motion planning), the number of obstacles to remove is same for both algorithms. This is because the two methods use the same method to determine which object to be relocated. The algorithm runtime of LocalSearch is 82 times longer than that of the method in [20] on average because of the computing time for motion planning. The task executions always succeed in LocalSearch, whereas the failures of the method in [20] happens more frequently as the number of objects increase. In LocalSearch the object to be removed is determined by considering only the gripper and motion planning then determines whether the object can be grasped without full-link collision. When motion planning fails, LocalSearch searches another object. The success rate of the method in [20] decreases for denser environments, because full-link collision is not checked.

4) GLOBALSEARCH WITH BFS AND HEURISTIC SEARCH
Finally, we compare GlobalSearch using the Breadth-First Search (BFS) and the heuristic search. In addition, we compare the algorithms with and without d max . As described in Sec. III-D, the heuristic search employs a heuristic cost, which is defined as the sum of the minimum magnitudes of histogram at the current node and its parent node such that it may search less number of nodes compared with using BFS. This may cause reduction in searching time.  For both of BFS and the heuristic search, d max limits the area around a target and the objects only within that area are involved to compute the histogram. Due to this, using d max could reduce the average K in Fig. 11a and the algorithm runtime in Fig. 11b for the both searches, compared to the results without using d max . BFS with d max removed the fewest number of objects on average as in Fig. 11a. For the case of 16 objects, the heuristic search with d max reduces 70% in the algorithm runtime compared to BFS with d max .
When d max is not applied, the average K 's of BFS and the heuristic search are same in Fig. 11a. The heuristic cost (H t G ) without d max decreases by 1 per the depth of tree because a node is generated for removing each object from outside to inside in the workspace. This may cause the same average K 's for both searches. When using d max , some nodes, where H t G decreases by more than one per the depth of tree, can be generated. This can reduce the number of objects and the histogram. In some cases, BFS can reach a node, where the histogram magnitude converges to 0 (i.e., the target is graspable) earlier than the heuristic search, even any node at the same depth in the sequence does not have a minimum H t G . Therefore, BFS may have the fewer average K than the heuristic search. Since the number of generated nodes in heuristic search is smaller than in BFS, the heuristic search took less time as in Fig. 11b.

B. PRACTICAL EXPERIMENTS
Experiments are conducted to validate the applicability of our two algorithms in a real environment. We use a Panda manipulator which has 7-DOF and a Robotiq 2F-85 gripper which has 2-fingers with 85mm stroke. The experiment instances consist of objects placed randomly on a shelf and a tub to place the objects as seen in Fig. 12. The objects are snack boxes which are different in shape. To obtain the poses and shapes of objects, we use an Intel realsense D435i as a vision sensor and Complex-YOLO [30] for object recognition. Complex-YOLO creates a 3D bounding box that represents the shape and pose of an object with a point cloud. The reachable direction of the object is determined by the stroke of the gripper and the bounding box. The vision sensor is attached to the wrist of the manipulator.
We test our algorithms with the number of objects N = 5, 8 and with 30 random instances for each N . Figs. 13 show the task process of each algorithm with N = 8. We measure the number of removed obstacles (K ) and the success rate. The results of the experiments are summarized in TABLE. 4. GlobalSearch has a lower K compared to LocalSearch. As N increases, the difference in K between the two algorithms increases like the simulation results (Fig. 10a). However, The success rate is higher in LocalSearch than in GlobalSearch. The main cause of the task failure is the uncertainties of the objects' postures. The values of an object recognized by the vision sensor have errors in size or pose. In particular, the error of an object in the back, which is partly hidden from view, is higher than that of an object in the front. LocalSearch selects an object by using the newly updated poses of objects before grasping. On the other hand, Glob-alSearch finds the order of obstacles based on the initially obtained values of objects. Therefore, GlobalSearch has a higher probability of failure to grasp objects during operation compared to LocalSearch.

V. CONCLUSION AND FUTURE WORKS
In this paper, we propose a global search and a local search algorithms to plan the rearrangement of objects in clutter to grasp a target. The two algorithms aim to determine as few obstacles to be removed as possible. Also, they consider the variable reachable directions of objects. The algorithms are evaluated with the varied number of objects and compared in realistic simulation environments.
The global search algorithm finds a global solution through a tree search method. The search strategy used runs in exponential time but its algorithm runtime is not prohibitive, as pruning and ordering are implemented to improve searching speed. Even with the moderate runtime (up to 10 seconds with 16 objects), the number of removed obstacles can be reduced so that the total time (the sum of algorithm runtime and robot execution time) can decrease. On the other hand, the local search algorithm performs faster than the global search algorithm and can work online, which means it searches an object to be remove one by one. If an environment changes dynamically (i.e., a hidden object appears or object recognition is noisy so object poses change frequently as in the practical experiments), the local search algorithm could be more appropriate as it doesn't compute the entire sequence again but remove the next obstacle at each step. Also, the local search algorithm has a polynomial time complexity so it can be adaptive to the size of instances. It is shown that the both algorithms are complete and can guarantee to obtain a solution. For the global search, a heuristic cost was applied to reduce the number of searching nodes such that it could provide a solution close to the global optimum but in less time than using BFS.
To consider the uncertainties of sensing data in the real world, one of the future works may employ an error measurement experiment to estimate the size and posture of an object. This may help our planners to be more reliable in real applications. Another future work is to deal with partially observable environments. If a target or some objects are invisible due to the poses of a camera and other objects, it is necessary to find such objects effectively.