Cooperative Pathfinding Based on Memory-Efficient Multi-Agent RRT*

In cooperative pathfinding problems, non-conflict paths that bring several agents from their start location to their destination need to be planned. This problem can be efficiently solved by Multi-agent RRT*(MA-RRT*) algorithm, which is still state-of-the-art in the field of coupled methods. However, the implementation of this algorithm is hindered in systems with limited memory because the number of nodes in the tree of RRT* grows indefinitely as the paths get optimized. This paper proposes an improved version of MA-RRT*, called Multi-agent RRT* Fixed Node(MA-RRT*FN), which limits the number of nodes stored in the tree of RRT* by removing the weak nodes on the path which are not likely to reach the goal. The results show that MA-RRT*FN performs close to MA-RRT* in terms of scalability and solution quality while the memory required is much lower and fixed.


I. INTRODUCTION
The problem of planning a series of routes for mobile robots to destinations and avoiding collisions can be modeled as a cooperative pathfinding problem.Traditionally, this problem is often simulated in highly organized environments such as grids, which include several obstacles and agents.To find the paths of these agents, the straightforward method is looking for the answer in a joint configuration space which is composed of the state spaces of single agents.Such a space is typically searched using a heuristic guided function such as A* [1].However, the problem of cooperative pathfinding has been proved to be PSAPCE-hard [2].
To solve the cooperative pathfinding problems, many works have been proposed in the last decades.All these methods can be divided into three categories: decouple method, coupled method, and hybrid method.Each method has its disadvantages, for example, the computational cost of coupled approaches are susceptible to the increases of agents, while the decoupled methods cannot guarantee their completeness.The hybrid approach, which inherits the advantages of the coupled and decoupled approach, seems promising.But when the decoupled planner fails, it may be more time-consuming than just using a single planner.
After Karaman and Frazzoli introduce an asymptotically optimal algorithm RRT* [3], Čáp marries it to the classical multi-agent motion-planning algorithm and proposes Multiagent RRT*(MA-RRT*) [4].MA-RRT* is a coupled algorithm.But, unlike other coupled approaches, it alleviates the increase of computational cost as the number of agents increases by leveraging the idea of the Monte Carlo method.As a result, it can solve the multi-agent path planning problem efficiently.Besides, MA-RRT* comes close to the decouple planner in the efficiency while still maintaining the completeness and optimality, which makes MA-RRT* still advanced in the field of coupled method.
There are many state-of-the-art works that aim to improve the MA-RRT*, such as [5] in 2019, which improves the efficiency of MA-RRT* at the expense of optimality and completeness.Unlike MA-RRT*, [5] applies RRT* for each agent in turn, and the agents whose paths have been planned by RRT* are treated as moving obstacles.
However, the application of the MA-RRT* is hindered in systems with limited memory, because as the solution gets optimized, the number of nodes in the tree grows indefinitely.The closest work to this problem is the RRT* Fixed Nodes(RRT*FN) proposed by Adiyatov [6], which only focuses on improving the memory efficiency of RRT*.Up to now, none of the works aims to limit the memory required for the MA-RRT* algorithm.
This paper presents a new MA-RRT* based algorithm, called Multi-agent RRT* Fixed Nodes (MA-RRT*FN), which works by employing a node removal procedure to limit the maximum number of nodes in the tree.The property of our algorithm can be observed in Fig. 1, which shows the two search trees for single-agent navigation using MA-RRT* and MA-RRT*FN respectively in a 2D grid map with the same number of iterations.It can be seen that the trees MA-RRT*FN generated are more sparse than MA-RRT*.
The main contributions of this paper are as follows: 1) The proposed MA-RRT*FN requires a fixed memory, which is much less than MA-RRT* whose memory cost grows indefinitely, while its scalability and convergence rate are very close to MA-RRT*.2) The informed-sampling MA-RRT*FN, which is the improved version of MA-RRT*FN, performs very similarly to isMA-RRT* concerning the suboptimality of solutions, while its convergence rate and scalability are better than isMA-RRT*.

II. RELATED WORK
The methods for solving Cooperative pathfinding can be classified into three categories: coupled method, decoupled method, and hybrid method.
1) coupled approaches: In coupled approaches, all agents' routes are computed as a union.The algorithm searches agents' joint configuration space to find the solution, which can provide a stronger guarantee on the feasible path.For example, Standley [7] proposes two techniques, called Independence Detection(ID) and Operator Decomposition(OD).The combination of these two techniques, the ID+OD algorithm, which is capable of solving relatively large problems in milliseconds, is both complete and optimal.Standley then refines the algorithm into an anytime algorithm called Optimal Anytime(OA) [8], which first finds out a solution rapidly, and then utilizes any spare time to improve that solution incrementally.
Alternative methods such as [9] and [10] model the Cooperative pathfinding problem as Integer Linear Programming (ILP) and Boolean Satisfiability (SAT) problems respectively.In [9], Jingjin establishes a one-to-one solution mapping between multi-robot path planning problems and a special type of multiflow network and uses the Integer Linear Programming to optimize the four goals: the makespan, the maximum distance, the total arrival time, and the total distance.
However, All these coupled approaches are susceptible to the increases of agents.The computation cost of these techniques increases dramatically as the increase of robots.
There are also many attempts to use the sampling-based method, such as RRT [11][12] [13], PRM [14], to solve multiagent motion planning problems.These algorithms alleviate the increase of computational cost as the number of agents increases by leveraging the idea of the Monte Carlo method.However, for the reason that they are based on the RRT and PRM, these algorithms are not optimal.After the RRT* is proposed, Čáp combines RRT* with OA and proposes MA-RRT* [4].Unlike other sampling-based method, MA-RRT* is both optimal and complete, and it outperforms many classical algorithms such as ID+OD and OA in terms of efficiency and scalability.
2) decoupled approaches: In decoupled approaches, all agents' paths are planned individually.For example, in [15], David Silver introduces three decoupled approaches which decompose the problem into several single-agent navigations: Local Repair A*(LRA*), Hierarchical Cooperative A*(HCA*) and Windowed Hierarchical Cooperative A*(WHCA*).In [16][17] [18][19], the path of each agent is computed individually based on the pre-assigned priorities.The same case can also be seen in recent work [5] [20].
Alternative method utilizes conflict based search(CBS) [21] to find the solution.Such as the ByPass-CBS and Continuous-Time-CBS proposed by recent work [22] and [23], which pushes the performance of CBS further.
Although those decoupled methods can efficiently find the solution, their completeness cannot be guaranteed.
3) hybrid approaches: The hybrid approaches, which leverage the strengths of both coupled and decoupled techniques, find the solution by firstly employing decouple methods.If the decoupled techniques fail, the coupled approaches would be employed.For example, M* [24] solves the multiagent path planning problem by taking the decouple manner first, and when the robots' paths conflict, the conflicting agents are merged into a meta-agent and planed the path by a coupled planner.
The recent works based on CBS also take the idea of hybrid approaches, for example, MetaAgent-CBS [25], and its improved version [26], which employs the decoupled techniques first and detects the conflicts, then merges the conflicting agents and applies coupled methods.

III. PROBLEM FORMULATION
To make a fair comparison with the MA-RRT* algorithm, which is simulated on graphs, the paper tests both the two algorithms(MA-RRT* and MA-RRT*FN) in a fourconnected grid world G M and uses the following definition: Assuming that n agents labeled 1, ..., n are running on a Euclidean space, and each agent, which takes up a single cell x i (i ∈ [1, n]) of the grid world, has a unique start location s i and destination d i .For each timestep, all agents can move to its four neighbor cells x i (x i ∈ children(G M , x i )) if it is free or stay on its current location [8].Besides, the transitions are prohibited in which agents pass through each other.
A cell is free means that it will not be occupied by an agent at the end of the timestep and does not include an obstacle.The timesteps that a single agent stays on a grid cell are represented as dur(x i ).The total number of timesteps c that the agent has taken from its start state s i to the goal location d i is regarded as the cost of the individual agent's path path i .If all the agents can reach their goal without collision, then the sum of each path cost is taken as the cost of the final solution, which is the metric of solution quality.Formally, Where p stands for an n-tuple of paths (p 1 , ..., p n ).To simplify the representation of nodes in the rapidly random tree, this paper uses x to represent the n-tuple of position (x 1 , ..., x n ).The start positions of all agents are given as s, which is an n-tuple (s 1 , ..., s n ).Similarly, the n-tuple (d 1 , ..., d n ) is the destination d.Thus, a node in the tree can be denoted as an n-tuple joint state, and each state stands for the position of a single agent.

IV. MA-RRT* AND MA-RRT* FIXED NODES
The multi-agent RRT* algorithm is designed based on RRT* algorithm, which can expeditiously find a path from a specific start location to a given target region in continuous state space by incrementally building a tree [3].When the first solution is found, the RRT* algorithm will continue to improve the solution by sampling new random states in the configuration space, which would cause to the discovery of a lower-cost path.
The MA-RRT* inherits all the properties of RRT*.The main difference is that, in continuous configuration space, if two nodes are mutually visible, then they can be connected.While in the discrete space, two nodes can only be connected if a valid path between the two nodes can be found by the heuristic search.Thus, The MA-RRT* more like a graph version of RRT*(G-RRT*), unless it searches for the shortest path in a configuration space which stands for the joint-state of all agents [4].
The algorithm 1 shows the skeleton of MA-RRT* algorithm.It begins with a tree that is rooted at the joint initial state x init and continues to sample the random state x rand from free joint configuration space before extending the tree to x rand .This loop will continue until it is interrupted.
The MA-RRT* Fixed nodes(MA-RRT*FN) utilizes the skeleton of the MA-RRT* algorithm and extends it with some node removing procedures.Therefore, the MA-RRT*FN behaves like MA-RRT* before the maximum number of nodes is reached, and after the number of nodes reaches a threshold, it continues to optimize the tree by removing the weak nodes that are not likely on the path reaching the goal while adding the new node.
The skeleton of MA-RRT*FN is shown in algorithm 2. Initially, the tree grows before the maximum number of nodes M is attained, after which the MA-RRT*FN removes a node that has one or no child in the tree before adding a new node.The MA-RRT* and MA-RRT*FN use the same skeleton of EXTEND and GREEDY procedure, shown in algorithm 3 and 4 respectively.
Like the MA-RRT*, in each iteration of MA-RRT*FN, the SAMPLE routine randomly chooses a free state in the joint space.Then, the EXTEND function generates a new node x new in the free space by steering from the nearest node to the new randomly sample, and then check whether T ← (V, E); 4: x rand ← SAM P LE 5: T ← (V, E); 7: x rand ← SAM P LE; if N o F orceRemovalP erf ormed() then end if 15: end while contained in this tree.If so, x new will be deleted from the tree, and the EXTEND function will restart, if not, x new will be added to the tree.After that, the algorithm searches the nodes that near the x new to construct the nodes set X near and chooses a node as the parent of x new , which makes x new has the lowest cost to initial state, from X near and x nearest .Finally, it updates the cost of X near by rewiring to x new if these nodes decrease the total cost by assigning x new as the parent.
Unlike MA-RRT*, the MA-RRT*FN employs a node removing procedure in the EXTEND function, shown on lines 24 and 25.During the EXTEND procedure, the algorithm updates the cost of nodes near the newly added node x new .If a node x near from X near could reach a lower cost to the initial state by reconnecting to the newly added node, then the algorithm would check whether the parent of this node has only one child and whether the number of nodes in the tree reaches M. If so, x near will be rewired as a child of x new , and the parent of x near will be deleted.If none of the nodes in the near domain of x new has only one child to remove, then the ForcedRemoval procedure in algorithm 2 will be employed, which searches the entire tree, except the x new and the goal node, to find the nodes without children and deletes one randomly [6].In case no nodes are deleted in EXTEND and ForceRemoval function, x new is removed from the tree.
x min ← x nearest 10: for all x near ∈ X near do 11: x min ← x near for all x near ∈ X near n{x min } do 22: if onlyChild(parent(x near )) and M = N odesInT ree(v) then

25:
RemoveN ode(parent(x near )) return G = (V , E ) 33: end if MA-RRT*FN has the same GREEDY procedure as MA-RRT*.In the GREEDY procedure, the joint state is decomposed to n single-agent states.Thus, the algorithm can steer each agent from its start node s to the destination d for one timestep separately by merely depending on heuristic guided search, which utilizes Euclidean distance as the metric, and then check the path generated for all agents collide or not.If those paths are conflicted, the algorithm will return the path calculated in the prior timestep; if not, the algorithm will check whether all agents reach the target, if they do, the algorithm would return the path of all agents as a series of joint transitional states between the s and d, forming an edge in the tree.If the goal is not attained and the cost of paths exceeds the user-specified threshold c max , the algorithm will return the path between the s and the currently arrived node.(path i , ..., path n ) ← path

4:
for all x i ∈ x do 5: end for 10: if not COLLISION F REE(path 1 , ..., path n ) then end if 8: end while Both MA-RRT* and MA-RRT*FN evenly sample the random states in agents joint configuration space, which would cause a relatively lower convergence rate.To improve the speed of MA-RRT*FN in finding the solutions, we take the ideas from isMA-RRT*, the improved version of MA-RRT* proposed in [4].The improved algorithm is called informed sampling MA-RRT*FN(isMA-RRT*FN), shown in algorithm 5, which runs G-RRT* for every single agent to find some high-quality solutions and then runs MA-RRT*FN for all agents together with biased sampling, which samples states near the single-agent optimal path.

V. EXPERIMENTS AND RESULTS
The paper first compares the capability of the MA-RRT*, MA-RRT*FN, isMA-RRT* and isMA-RRT*FN in terms of scalability and suboptimality, then compares the memory cost and solution quality of these algorithms in a 50x50 grid with 3 agents navigation.In the sampling procedure, all four algorithms choose the final goal state as the new random sample with the probability of p, which is the user-specified parameter, to speed the procedure of spanning towards the target.All experiments are performed on matlab 2018a 64- bit in a common program framework and tested on intel core i7 8700k 3.7 GHz CPU.
To make a fair comparison between these four algorithms, this paper utilizes the problem instance set of [4], mentioned as follows, to evaluate the capability of the algorithms.The agents run in a grid-like square-shaped world, where each agent occupies a single cell.At each timestep, all agents can stay on the cell waiting for other agents or move to the 4neighborhood cell of its location if these cells are free.The ten percent of the grids are removed to represent obstacles or barriers.A unique start location and destination are selected randomly for every agent.
The problem instances set varies in the following two parameters: The grid sizes: 10x10, 30x30, 50x50, 70x70, 90x90 and the numbers of agents: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, which are the same to [4].The two parameters are combined in each grid size and number of agents.For each combination, this paper randomly sets 120 instances.Therefore, the first experiment contains 6000 different problem instances in total.All algorithms are implemented on the same instance set, and the runtime of each instance is limited to 5 seconds.For MA-RRT*FN and isMA-RRT*FN algorithm, the maximum number of nodes is set to 200.
The results are plotted in Fig. 2 and Fig. 3.In Fig. 2, the values in the x-axis are the index of instances which are sorted according to the runtime needed when the first valid solution is found, the values in the y-axis are the runtime when the algorithm finds the first solution.For each algorithm, the ordering can be different.The last point of x-position in the performance curve indicates how many instances are solved within 5 seconds.It can be seen that resolves 66% of the instances, MA-RRT*FN 65%, isMA-RRT* 86% and isMA-RRT*FN 87%, from the problem instance set.The relative solution quality is shown in Fig. 3.The experiment compares all algorithms in terms of the first returned solution and the best returned solution within 5 seconds runtime limit.The suboptimality is calculated by the following formula: suboptimality = the cost of returned solution the cost of optimal solution − 1 • 100.
Then the paper compares the four algorithms in terms of memory cost and convergence rate.For clarity, this experiment fixes the two parameters, the grid sizes: 50x50 and the numbers of agents: 3, to qualitatively show the memory needed and convergence rate of all algorithms.And for this problem instances set, this experiment randomly sets 120 instances with different random obstacles and different start locations and destinations.All algorithms run on the same instance set, and the maximum number of iterations of each instance is limited to 5000.For MA-RRT*FN and isMA-RRT*FN , the maximum number of nodes is set to 1000.Fig. 4 and Fig. 5 show the average minimum path cost and the average number of nodes in the tree versus the iterations of all algorithms in terms of the solutions of 120 instances, respectively.The x-position of the first point in the Path cost curve can be interpreted as the solution the algorithm found at the first iteration.For those who do not find a path at the current iteration, the cost of their first solution will be taken into account to compute the average minimum path cost at the current iteration.
Fig. 4 shows that the MA-RRT*FN has a similar convergence rate to MA-RRT* while its number of nodes in the tree is much less, as shown in Fig. 5, memory required for MA-RRT* grows linearly with the iterations increase, while the number of nodes stored in MA-RRT*FN is lower and fixed.The results also indicate that the isMA-RRT*FN performs better than isMA-RRT* concerning the convergence rate to the optimal path, while it also has a lower and fixed memory.Finally, MA-RRT* is proved to be convergent in [4], although the experiment results strongly imply that the MA-RRT*FN and isMA-RRT*FN also have the theoretical guarantee of converging to the optimal path, the optimality of MA-RRT*FN and isMA-RRT*FN remains to be proved.

VI. CONCLUSION AND FUTURE WORK
The paper proposes MA-RRT*FN, an anytime algorithm that has lower demands in the memory requirements, to slove the multi-agent path planning problem in the systems with limited storage.Unlike MA-RRT*, whose memory cost is indefinite as the solution converges to the optimal path, our techniques employ some node removing procedures to limit the number of the nodes storing in the tree and keep on optimizing the path when finding the solution in agents' joint-state space.The experiment results show that the MA-RRT*FN, which has a fixed number of nodes in the tree, performs as well as MA-RRT* in terms of scalability, solution quality and convergence rate in solving multi-agent path planning problems.While the improved version, isMA-RRT*FN, has a better convergence rate and scalability than isMA-RRT* while its memory required is much lower.
This paper simulates the algorithm on a motion graph, which connectes the states in the tree by a valid path.However, the algorithm can also be extended to continuous space by using the straight-line visibility approach in place of the GREEDY function.
In the future, we will continue to improve the convergence rate of MA-RRT*FN by employing different node removing procedures.Another area we would like to explore is the application of the MA-RRT*FN algorithm in a more dense environment.