A Distributed ADMM Approach to Non-Myopic Path Planning for Multi-Target Tracking

This paper investigates non-myopic path planning of mobile sensors for multi-target tracking. Such problem has posed a high computational complexity issue and/or the necessity of high-level decision making. Existing works tackle these issues by heuristically assigning targets to each sensing agent and solving the split problem for each agent. However, such heuristic methods reduce the target estimation performance in the absence of considering the changes of target state estimation along time. In this work, we detour the task-assignment problem by reformulating the general non-myopic planning problem to a distributed optimization problem with respect to targets. By combining alternating direction method of multipliers (ADMM) and local trajectory optimization method, we solve the problem and induce consensus (i.e., high-level decisions) automatically among the targets. In addition, we propose a modified receding-horizon control (RHC) scheme and edge-cutting method for efficient real-time operation. The proposed algorithm is validated through simulations in various scenarios.


I. INTRODUCTION
Technological advances in sensor networks have enabled many applications such as monitoring pollution/atmospheric phenomena, reconnaissance and surveillance missions in both defense and civilian areas, traffic monitoring, forest fire localization and wildlife tracking [1], [2].These sensor networks are based on two types of sensors: static and mobile sensors.It is widely demonstrated that mobile sensors offer distinctive advantages over static ones in terms of the quality of sensing and estimation [3]- [11], coverage area, data offloading [12] and adaptability to dynamic environments [13].However, for mobile sensor networks, it is crucial to efficiently operate sensing agents to maximize information gain about target systems while satisfying constraints on entire systems.This problem is described as a sensor planning problem that determines the future utilization of sensing agents given the current states of resources and environments.
One typical approach for the sensor planning problem is making myopic plans covering one or few time steps to reduce the complexity of the problem.For instance, Hoffmann and Tomlin [4] proposed a single step mobile sensor planning algorithm for single target tracking based on an information-theoretic metric.They plan the one step ahead control inputs of the sensing agents to maximize the mutual information between the sensor measurements and * Equal contribution.† Corresponding author: hanlimc@kaist.ac.kr the estimated target state.In multi-target tracking tasks, most existing works adopt the two-phase approach in which they cluster and assign targets to each sensing agent and, then, solve the split problem.[8], [9].However, such myopic planning algorithms are vulnerable to bad local optima and show poor performances for sensor networks with sensing holes [14], [15].Their optimization over the near future cannot reflect long-term goals such as taking targets out of sensing holes by changing the positions and directions of sensors.Thus, we focus on non-myopic planning algorithms which explore multiple time steps.
There are several existing approaches to non-myopic sensor planning problems.One natural way is to extend sensor scheduling methods [16]- [18] for static sensor networks.As the methods select operating sensors among finite candidates, control inputs of the mobile sensors are discretized and the same optimization methods are applied for the finite optimization variables [3], [15].However, this approach suffers from a critical computational issue that the time complexity grows exponentially as the planning horizon increases.Another approach is the sampling-based planning.Informationrich Rapidly-exploring Random Tree (IRRT) algorithm [19] connects random sample points on the sensing space so that the sensor measurements along the connected paths maximize the information on a single target.Luders et al. [20] apply this IRRT algorithm in a two-phase approach so that it tracks multiple targets.However, these sampling-based works do not guarantee any optimality in their solutions [21].
A better candidate for non-myopic planning of mobile sensors is local trajectory optimization method.It searches for a solution to the optimization problem near the given initial guess using gradient descent algorithms.This gradientbased trajectory optimization has been successfully applied in many single-target tracking applications by employing analytically differentiable filter models on target estimation [5]- [7], [10], [13].It shows much faster computation time than the discrete optimization methods and guarantees local optimality unlike the sampling-based methods.However, in multi-target tracking problems, local trajectory optimization methods can be easily trapped in a bad local optimum unless a proper initial guess is given.
In this paper, we investigate non-myopic path planning of mobile sensors for multi-target tracking problems.There is a key challenge to employ local trajectory optimization method in multi-target tracking problems; to make a good initial guess, we need a high-level decision-making process that determines which mobile sensors should track which targets and in what order the targets should be tracked.
This problem is also referred to as the task-assignment problem.Unfortunately, the problem is hard to be formulated to reflect the mobility of the sensors and expected changes in the states of the sensors and the targets.Instead, existing works, while they are myopic, with two-phase approach use heuristics for the formulation [8], [9], which degrade the quality of initial guess and, thus, the target estimation performance.Therefore, the objective of this study is to solve the non-myopic path planning problem for multi-target tracking without heuristic task-assignment.
The main contribution of this paper is to provide a new practical algorithm without heuristic task-assignment on the non-myopic path planning problem for multi-target tracking.More specifically, we 1) reformulate the general non-myopic path planning problem for multi-target tracking to a distributed optimization problem with respect to targets, 2) solve the transformed problem by combining ADMM and local trajectory optimization method, and 3) propose a modified receding-horizon control (RHC) scheme and edge-cutting method for efficient real-time operation Our algorithm circumvents the task-assignment problem by solving subproblems for each target, which have only single task, and inducing consensus among those subproblems automatically.Consequently, our distributed framework integrates the decision-making process into the trajectory optimization process so that they are solved simultaneously without the aid of heuristic task-assignment algorithms.
The alternating direction method of multipliers (ADMM) [22]- [24] has been also effectively applied to other problems in robotic control and sensor scheduling [25]- [29].For instance, Bento et al. [25] adopted ADMM to multi-agent trajectory planning for collision-avoidance.ADMM is used to incorporate various constraints, such as energy minimization, with the planning objective.Ong et al. [26] also developed a message-passing algorithm based on ADMM to solve a collision-avoidance problem with communication constraints.Mordatch et al. [28] combined trajectory optimization and global policy (neural network) learning using ADMM in a robotic control problem.Liu et al. [29] used ADMM to solve an optimization problem with cardinality functions in scheduling static sensor networks for state estimation of dynamical systems.Like these works, we formulate a problem on which ADMM can work effectively and successfully solve the problem.
The rest of this paper is organized as follows.In Section II, we define the general non-myopic path planning problem for multi-target tracking.Then, we reformulate the original problem as a distributed trajectory optimization problem in Section III.Section IV presents the details of our algorithm including the modified RHC scheme.In Section V, computational complexity of the proposed algorithm is analyzed and the edge-cutting method is introduced to reduce the complexity.Finally, the simulation results of the proposed algorithm are given in Section VI.

A. Non-Myopic Path planning for multi-target tracking
We consider the problem of non-myopically planning multiple mobile sensors to efficiently track multiple targets.We formulate this problem based on the known dynamics of targets and sensors, as well as the sensing model.From these models, we plan an optimal path that each sensor takes successive measurements to estimate target states efficiently.
In this work, targets are assumed to follow linear stochastic dynamics which is a common assumption in the literature [3], [5], [13], [15].Let the index set of targets be T = {1, 2, • • • , M }, where the number of targets, M , is known and fixed.The j-th (j ∈ T ) target's state, x (j) t , is updated as: where ω w,t ) is the Gaussian random noise independent of other targets and measurements.A (j) t and Σ (j) w,t are the transition matrices and noise covariance matrices of the j-th target, respectively.
For sensor dynamics, we employ a general model, f (i) , for each sensor.Let the index set of mobile sensors be t , is expressed as: where u t is the control input at time t.Here, each mobile sensor is assumed to be carried by one fixed-wing UAV for simplicity. 1enoting the measurement taken by the i-th UAV against the target j at time t as z (i,j) t , the sensor measurement model can be expressed as follows: where v v,t ) is the Gaussian random noise, independent of other measurements and targets' motion noise, w (j) t .These models are incorporated to form an objective function, U (j) , which indicates the uncertainty of the target state estimation.The goal of the general target tracking problem is to estimate target states accurately, which corresponds to minimizing the uncertainty.Since we are solving a nonmyopic path planning problem, the optimization variables are a sequence of states/control inputs of mobile sensors over certain future time steps, T .Thus, in general, non-myopic path planning for multi-target tracking can be formulated as [3], [13], [15]: where t , u ) , Here, the optimal trajectory, P, Ū , satisfies the dynamics constraint and the given initial condition.U (j) (p t ) is the uncertainty of the j-th target at time t, u t R t u T t penalizes the control effort of the mobile sensors along the trajectory, and η(X t ) is an additional constraint, such as collision-avoidance, related to the states of the mobile sensors.R is a weighting matrix.

B. Target State Estimation
The target states are estimated through the extended Kalman filter (EKF), which is widely used for state estimation of non-linear systems [30].Note that any other nonlinear Gaussian filters, such as the unscented Kalman filter, can be incorporated instead.The targets are assumed to have independent motions and be measured independently via data association methods [31].Then, their joint distribution could be factorized and a single tracking filter is used for each target [10].
For each target, its belief states, (x t ), are updated by (for simplicity, the superscript (j) is temporarily dropped): where Equations ( 5) and (6) define the belief dynamics.The second term in (5), called the innovation term, depends on the measurements z t+1 .Equation ( 6) evolves given the current covariance of the target, regardless of the measurement.
Our optimization problem in (4) evaluates the target state estimations in future time steps.Since future measurements are unknown in advance, we treat the innovation term in (5) as stochastic, which follows N (0, K t H t Σt ) [32].Defining the belief state of the j-th target as b (vec[•] is a vector with upper triangular portion of the input matrix since Σ (j) t is symmetric), the stochastic belief dynamics of the target is given by: where

C. Uncertainty Measure
The uncertainty term in (4) represents the inaccuracy of target state estimation.For this measure, we use The trace of the covariance matrix of target state estimation physically means the average variance of the estimation error and is one of the widely used metrics [5], [9], [13], [19].This measure is in a quadratic scale for physical distance error.
The expectation is taken since we are employing stochastic belief dynamics as in (7).Note that any other differentiable metric can be incorporated instead.

A. Challenge of Non-Myopic Problem
The non-myopic path planning problem in (4) generally faces an abundance of local optima due to the non-convexity of its cost function.This non-convexity primarily comes from the dependency of optimal sensor trajectories on highlevel decision-making.They significantly vary upon which target is assigned to which sensor and the order in which each sensor tracks the assigned targets.Since most trajectory optimization methods find solutions around an initial guess, our problem with numerous local optima is very sensitive to initialization.Therefore, we need to guide a solution by properly initializing the optimization variables.
Finding a good initial guess for the problem induces another hard problem, task-assignment.Unfortunately, it is very difficult to construct the task-assignment problem in our setting to reflect the mobility of the sensors and expected changes in the states of the sensors and the targets.One feasible option is to heuristically assign tasks based on the initial states of targets and sensors utilizing the traveling salesman problem (TSP) [33], CBBA [20], [34], and/or clustering [8], [9].However, it is still hard to consider the change of targets and sensors' states over time sufficiently, and the option may provide a bad local optimum.

B. Transformed Problem
Instead of solving the hard problem, we detour the difficulty and reformulate the original problem (4) into a distributed trajectory optimization problem.For each target, we solve: where Trajectory optimization for each target is much easier than that for multiple targets because task-assignment is not considered.Also, it is not difficult to generate the initial guess for the problem of tracking a single target.By solving the trajectory optimization for each target and making consensus, this distributed optimization is solved through Alternating Direction Method of Multipliers (ADMM) [22], [35]. 1) Distributed Alternating Direction Method of Multipliers: ADMM is an algorithm that efficiently optimizes objective functions composed of terms that each has efficient solution methods [22].For our purposes, we use a consensus and sharing optimization form of ADMM, also known as distributed ADMM [23], [24], [35].We consider the following minimization problem: This problem can be rewritten with the local variable a l and common global variable b: This is called the global consensus problem, as the constraint is that all the local variables should be equal (i.e., agree).Solving this problem is equivalent to optimizing the augmented Lagrangian: where λ l is the Lagrange multipliers for a l − b = 0, and ρ > 0 is a penalty weight.The ADMM algorithm updates the local variable a l , the common global variable b and the Lagrange multiplier λ l as follows.For k = 0, 1, • • • , we iteratively execute the following three steps: where α > 0 is a step-size parameter.Here, note that the first and the last steps can be carried out independently for each l = 1, • • • , L. Thus, we can run the algorithm in parallel depending on the implementation with multiple but no more than L computational devices, where each device performs an update on its set of variables.The optimization results a l are communicated back to a master node, which performs the global variable b update and returns the result back to other worker nodes.The processing element that handles the global variable b is sometimes called the central collector or the fusion center.

A. Overall Algorithm
We now describe the distributed trajectory optimization algorithm to solve the problem (10).After applying initial conditions, our algorithm works iteratively based on the distributed ADMM with three major steps.First, the original problem ( 10) is split into a separate optimization problem for each target and solved in parallel.Based on the optimization results, the common global variables are updated for consensus.Then, the Lagrange multipliers are updated to enforce the local variables to be closer to the common global variables.The complete procedure is summarized in Algorithm 1.
To solve the augmented Lagrangian, the split optimization problem for each target j is: where p j and u j are the local variables for the states and control inputs of the mobile sensors, respectively.This is a trajectory optimization problem with two additional quadratic cost terms and can be solved with the existing trajectory optimization method described in Section IV-B.The split problems are independent and, then, able to be solved in parallel, as described in Section III-B.1.In this work, the trajectory variables P j , U j in (15) are initialized by the method described in Section IV-B.2, and they are averaged to initialize the global common variables P C , U C .The updates for the common global variables are as follows: This is the mean consensus process that takes the average of the results for each trajectory optimization.Then, the common global variables affect the update of the trajectory variables for each target j (15), which causes the results Algorithm 1 Distributed Trajectory Optimization Update P λ , U λ using (17) 7: end while of trajectory optimization for each target to converge to the same solution.
The updates of the Lagrange multipliers for each target j are as follows: The important role of the multipliers is making high-level decisions, i.e., task-assignment.They put more forces on local variables which are largely deviated from the global variables to be close by reducing its magnitude more.By affecting the cost function in (15), the optimal trajectory solutions are guided into more informative regions.This process lets the task-assignment be automatically accomplished so that mobile sensors can further reduce the uncertainty of the targets.

B. Trajectory Optimization
We adopt belief space iterative Linear Quadratic Gaussian (belief space iLQG) [32] to solve the trajectory optimization problems described in (15).Belief space iLQG extends the iLQG method [36] to Partially Observable Markov Decision Processes (POMDPs) using a Gaussian belief, instead of a fully observable state.In the forward pass, belief space iLQG uses a standard EKF to compute the next time step belief.For a backward pass, belief space iLQG linearizes covariance in addition to quadratizing states and control inputs.Although the applications of this approach are focused on control problems, it is directly applicable to estimation problems (e.g., target tracking) due to the duality of control and estimation [37] The belief space iLQG operates with the belief dynamics of the whole system with targets and sensors.That of the targets is given in (7).By integrating it with the dynamics of mobile sensors, the belief dynamics of the entire system is represented by: where 1) Control Policy: Here, we explain the details of solving (15) with the belief space iLQG.By linearizing the dynamics around the nominal trajectory distribution, the approximate dynamics is expressed as: where .
where 0,t = (v t , ūt ).δv t = v t − vt and δu t = u t − ūt are the deviations from the nominal trajectory.The terms with subscripts denote Jacobian and Hessian matrices of their respective functions.
Given the linear dynamics (19) and quadratic cost (20), we obtain a quadratic approximation of the value function along a nominal trajectory vt : Following the dynamic programming principle [38], the Bellman equation for the value function V t (v t ) and control policy π t (v t ) in discrete-time are specified as: where By substituting equations ( 19) and ( 20) into ( 22), the Qfunction is given by: where In order to find the optimal control policy, we compute the local variations in the control δû that minimize the quadratic approximation of the Q-function in (22): The optimal control can be found as ût = ūt + δû t .Substituting (25) into (22) gives the value function V t (v t ) as a function of only v t in the form of (21): This recursion then continues by computing the control policy for time step t − 1.
2) Initial Guess Generation: For fast convergence to a better solution, good initial guesses are still required in the subproblems (15), in which it is much easier to find a good one than the original problem (4).For the initialization, we derive an initial guess generator for a case of fixed-wing UAV mounting a sensor with perpendicular boresight.The generator is based on Dubins paths [39] that can mimic the behavior of a fixed-wing UAV.
We first generate a continuous path.Let r sen and x(j) be the minimum turning radius of the i-th UAV, the distance between the nadir point and the center point of the i-th UAV's sensor footprint and the expected position of the target j, respectively.As shown in Fig. 1(a), a path is created to arrive as soon as possible at any point of a circle with radius r (i) sen centered on x(j) with the sensor boresight toward the center.Then, the remaining path is set for the sensor to rotate along the circle.Here, r (i) sen should be set to an appropriate value reflecting the movement of the target.The expected position x(j) of the target j can be predicted using the current estimates and the dynamic model of the target.
Then, the generated path is discretized at specific intervals and used as waypoints to generate a sequence of control inputs.For the sequence of desired waypoints ([p x , p x , p we construct a controller that reduces the angle between the line connecting the two adjacent waypoints and the heading direction of the UAV.A damping effect is also added to the controller for the stability.The geometry of this path tracking problem is depicted in Fig. 1(b).In our simulation, the control input of the PD-controller can be computed with (30) as follows: where K p and K d are the gains of the controller.

C. Modified Receding Horizon Control for Real-Time Operation
For real-time operation, the planning should be done while the UAVs are executing the previous plan.Otherwise, they stop and wait for the next plan for every planning cycle.Also, the established plan could be corrected by reflecting newly obtained information during its execution.For these purposes, we propose a modified receding-horizon control P, Ū = P, Ū new 13: end while (RHC) scheme with introducing a new concept of control horizon.RHC is a form of a feedback control system which executes only the forepart of a plan and start over the planning for a shifted time horizon with updated information [40].Extending this concept, our algorithm plans for future T p time steps while executing the previous plan for T c time steps with predefined control horizon T c and planning horizon T p (T c ≤ T p ).Note that T c is determined by the available computational resources and the communication delay, and T p should be large enough so that the trajectories of the mobile sensors can cover the overall domain of interest.The modified RHC is described in Algorithm 2. Here, we explain a single cycle starting at t = 1.Pre-calculating the future plan for t = T c to T c + T p requires the target belief estimation at t = T c .We use the maximum likelihood assumption and estimate the future belief without actual measurements (line 6).A new plan P, Ū new for t = T c to T c + T p is created using the proposed distributed trajectory optimization algorithm (line 7).This planning is done in parallel to executing the previous plan.While the algorithm plans for T p future time steps, only the fore T c steps are actually executed (line 9).The process is then repeated for the next time window.

A. Complexity Analysis
The computational complexity of each iteration of the distributed ADMM is dominated by solving the subproblems defined in (15) because computing the global common variable and Lagrange multipliers requires very simple computations.The subproblems are solved by the belief space iLQG method described in Section IV-B , and its computational complexity in control problems is already well analyzed in [32].The bottleneck of running time in our problem lies in the calculation of the matrix Q vv,t in (23).
We first consider the case of a single target.Let m and n be the state dimensions of the target and the mobile sensors, respectively.As the target belief contains the covariance ].The number of cycles cannot be expressed in terms of dimensional notation, but convergence can be expected after 10-100 cycles in practice.Furthermore, it is known that belief space iLQG converges with a second-order rate to a local optimum [32].
For multiple targets, our algorithm operates distributedly.The complexity is, then, multiplied by the number of iterations of distributed ADMM if the computation is fully parallelized.ADMM is known to require a few tens of iterations to converge with modest accuracy for many applications [22], [29], [41].Note that when the original problem ( 4) is solved non-distributedly, the complexity is multiplied by M 2 , instead, regarding the independence among the targets.This is more complex than our distributed approach when M is large, and it even requires high-level decision-making as mentioned in Section III-A.There have been many researches on fast convergence for general distributed ADMM [25], [27], [42].In this paper, we propose an edge-cutting method which is tailored to our algorithm and effectively shortens its running time.

B. Edge-cutting Method
The computational load of our algorithm can be effectively reduced when we consider the characteristics of our problem.As pointed out earlier, our distributed scheme automatically makes high-level decisions (i.e., task-assignment).A mobile sensor may choose a path specialized for certain targets during the consensus and disregards the others.In this case, the sensor cannot contribute to tracking the disregarded targets, and, thus, it is a bad investment of computational resources to solve the trajectory optimization of the sensor for those targets.Based on this observation, we propose a heuristic method to detect this inefficacy and halt the unprofitable investment.
The edge-cutting method detects unavailing sensors for each target for every consensus step in ADMM.For given > 0, the method excludes sensor i in the tracking task of target j if where E z (j) ) denotes the uncertainty of the target j when the sensor i is excluded.The is chosen to be a threshold for judging whether the sensor i reduces the uncertainty of the target j enough or not.The uncertainty is evaluated for P C , U C obtained in the consensus phase (16) except for the sensor i.This pruned architecture is reset to the original one for new time windows.When is properly chosen (not too large), the method has empirically shown similar target estimation results as not applying it in most cases with significantly reduced computation time.The method is quantitatively evaluated in Section VI-D.
Our edge-cutting method has similar, but different, concept with the three weight message-passing algorithm (TWA) in [25], which solves multi-agent trajectory optimization for energy minimization and collision-avoidance.As shown in Fig. 2, our algorithm can be also expressed with a bipartite graph as in [25] based on the structure of ADMM.Each blue node on the left side represents the trajectory optimizer solving (15) for each target, and each red node on the right side is the consensus node solving (16) for each sensor.For the consensus node, TWA employs weighted averages and excludes an edge from the consensus process by applying a zero weight without any change in the graph.On the other hand, our method prunes an edge of the graph when (28) is satisfied.This effectively reduces the complexities of the EKF in Section II-B and the iLQG in Section IV-B by reducing the dimensions of the sensor states and the measurement vector.Note that our algorithm allows pruning an edge since the trajectory optimization for a target always produces feasible solutions when solved for subsets of sensors.This is not the case for the optimizers for collision-avoidance in [25].

VI. SIMULATIONS
In this section, we numerically investigate the computational properties and the applicability of the proposed algorithms.To ascertain the effectiveness of the distributed trajectory optimization algorithm, we compare it with existing approaches; non-myopic trajectory planning methods with heuristic task-assignments as well as the myopic one.We also evaluate the modified RHC scheme and the edge cutting algorithm which are proposed for effective real-time operations.

A. Simulation Model
We assume that the UAVs fly at different altitudes from each other to alleviate collision-avoidance constraints.Throughout the entire simulations, the planning horizon, T p , is set to 100 seconds, which is enough for the UAVs to cover all areas of interest with flying speed of 16m/s.Furthermore, we set R = 100 in (15), and ρ v = {0.02,0.02, 500 × 180/π, 500 × 180/π}, ρ u = 500 × 180/π, α v = 0.2ρ v and α u = 0.2ρ u in (17).It is important to set the penalty parameters, ρ, α, properly for fast convergence of ADMM.While they are chosen empirically in this work, many extensions of the distributed ADMM algorithm have been explored to adaptively select the penalty parameters.We would like to refer the readers to [22], [42].
For target motion model, we consider the motion on the XY-plane with a four-dimensional state vector of positions and velocities (in x and y axes) for a target.The matrix parameters of the linear stochastic dynamics in (1) are set as: where dt is the time interval between two successive measurements, and q is the process noise intensity representing the strength of the deviations from the predicted motion by the dynamic model.When q is small, this model represents a nearly constant velocity.For simulations, we set dt = 0.1s and q = 0.01.
For sensor platforms, we consider a set of fixed-wing Unmanned Aerial Vehicles (UAVs).The dynamics of the UAVs is given by: where [p t and φ (i) t are the position, speed, heading angle and bank angle of i-th UAV at time t, respectively.Gravity acceleration is denoted as g.
The sensors are mounted on the left or right sides of the UAVs and are aimed in a 30-degree downward direction as shown in Fig. 3.Each sensor measures the signal-to-noise (f) Resulting variances of target estimations  ratio (SNR) [43]: where α, β, and γ are selected to model the SNR of the sensor.r (i,j) t and θ (i,j) t represent the distance between the sensor and the target as well as the angle between the sensor's boresight and the direction from the sensor to the target, respectively.When γ is set to zero, the sensor measures the quasi-distance, which is a commonly used model [4], [16], [44].We set α = 10, γ = 8ln2 and β = 0.

B. Process of Convergence
First, we consider a toy example in which a single UAV tracks two targets with zero initial velocities to explore the convergence process of the proposed distributed trajectory optimization algorithm.Figs. 4 (a-d) present the snapshots of the optimization process.In the beginning, the algorithm generates an initial guess (trajectory) of the UAV for each target independently, without any high-level decisions to consider both targets simultaneously.In the early phase, note that the consensus trajectory largely deviates from the distributively optimized one for each target.As the number of iterations increases, the deviation gets smaller and eventually vanishes.This convergence is enabled as each subproblem reflects the result of the other subproblem from the previous iteration through the Lagrange multipliers in (15).By repeating this interaction, the algorithm provides a non-myopic trajectory planning result that sequentially tracks two targets, as shown in Fig. 4(e).
In addition to observing the convergence, we evaluate the performance of the converged trajectory in reducing the uncertainty of each target.As the components of the posterior uncertainties after executing the plan, the variances of the target state (position) estimations are logged along the time as shown in 4(f).The variances of each target are effectively reduced when the target is in the sensing range, while they increase gradually due to the stochastic motion of the target when it is in sensing holes.

C. Single UAV tracking a Few Targets
In the second example, we compare the proposed algorithm with existing approaches under scenarios where one UAV tracks a few targets with zero initial velocities.The initial positions of the UAV and the targets on the XY-plane are randomly (uniformly) generated within a 1200m×1200m area.The targets are on the ground level and the UAV is flying at a fixed altitude of 100m.The initial covariance matrices of the target estimations are also randomly generated as diagonal matrices of which diagonals are picked uniformly between 10 and 100.
As the existing works [9], [20] on multi-target tracking tasks are myopic approaches, we extend them to non-myopic   ones.We adopt their task-assignment algorithms to produce initial guesses on the original undistributed problem (4) and solve it through belief space iLQG as in our algorithm.
The algorithms assign clustered targets to the closest mobile sensor (w.r.t. the center of the cluster) and determine the priorities of the clusters when they assigned to the same sensor.In [9], the priority is determined as the sum of the uncertainties (at t = 0) of the targets in each cluster, while [20] uses the Euclidean distance of each cluster from its assigned mobile sensor and the number of targets in the cluster.Initial guesses are generated in a similar way as described in Section IV-B.2, as shown as the dotted lines in Figs.5(c-d), so that the UAV can visit the clusters (single target for each cluster in this case) sequentially based on the task-assignment results.Additionally, we tested two more undistributed algorithms.First, we apply a myopic algorithm [8] to (4) repeatedly to cover the planning horizon with single-step plans.The other is solving (4) with a naive initial guess, u t = 0 ∀t.
Table I summarizes the average result over the 100 randomly generated topologies for each number of targets.To evaluate the planned trajectory, we compare the uncertainty of target estimation, which is the primary objective to minimize in our problem, averaged over targets, timesteps,  To analyze the details, we investigate a single instance of the randomly generated topologies in depth.
Fig. 5 shows the result of each algorithm in a scenario of single UAV tracking three targets.As shown in the Fig. 5(a), the myopic algorithm just plans a trajectory that minimizes only the control efforts until any target gets in the sensing range of the UAV.This is because the UAV cannot reduce the uncertainty without observing the targets.In Fig. 5(b), the resulting trajectory tracks only a single target.This shows that it could be stuck in a bad local optimum to perform non-myopic trajectory optimization without proper decision making.The two-phase approaches in Figs.5(c-d) show better trajectories so that they sense more targets.However, the trajectories include more portion of idle time without observing a target than the proposed algorithm in Fig. 5(e).
To compare the results numerically, we examine their average performance over 1000 randomly generated instances on the topology of Fig. 5.As shown in Table II, we evaluate posterior uncertainty and root mean square error (RMSE) of target localization in addition to uncertainty.Even though the topology was simple and easy to have intuitive decision making, the proposed algorithm still outperforms the others.While the two-phase approaches define new approximate/heuristic cost functions for the task-assignment and that decision making considers only the current status of the sensor networks, the proposed algorithm incorporates the decision making into the optimization process while considering the mobility of the sensor platform and the target state through the distributed formulation.

D. Multiple UAVs tracking Numerous Targets
The third example considers the situation in which two UAVs track numerous targets with zero initial velocities.To compare the proposed algorithm to the two-phase approaches [9], [20] involving target clustering, the topologies of the targets and UAVs are randomly generated based on three-cluster structure.(Choose centers of the clusters and the number and positions of the targets for each cluster randomly.)For the clustering algorithm, density-based spatial clustering of applications with noise (DBSCAN) [45] is used.As parameters of DBSCAN, we set the minimum number of targets for a single cluster and the maximum radius of a cluster to 2 and 150m, respectively.Table III summarizes the results.While the two-phase approaches show similar average uncertainty values, the proposed algorithm outperforms them.
We further investigate an instance of the randomly generated topologies to analyze the performance gap in detail.Fig. 6 shows the resulting trajectories of the two-phase approaches and the proposed algorithm in a scenario with 10 targets.The Euclidean heuristic and uncertainty-based taskassignments provide the same initial guesses, where each cluster is assigned to a single UAV.However, the proposed algorithm automatically generates the trajectories for the two UAVs to visit the same cluster.Table IV shows that the trajectories outperform those from the two-phase approaches.Even in this well-clustered topology, the proposed algorithm plans trajectories more optimized for the mobile sensors to visit the target clusters without the aid of clustering and taskassignment algorithms.

E. Evaluation of Edge-Cutting Method
We evaluate the edge-cutting method for the scenario in Fig. 6(b).Varying the values, the time when each edge is cut is analyzed in Fig. 7(a-c).As expected, larger implies However, as shown in Fig. 7(d), too large values make the solution of the trajectory optimization converge to a bad local optimum with a larger cost.This is because too easily cut edges change the original problem significantly beyond the degree of proper approximation.When the epsilon value is set appropriately, not too large, the edge-cutting method produces the same result as it is not applied ( = 0) with effectively reduced dimensions of the optimization.In Figs.7(e-f), the actual computation time is compared between the proposed algorithms with and without the edgecutting method.The comparison is performed on a 3.4 GHz Quad-Core Intel(TM) i7 PC.The results show that the edgecutting method reduces the computation time effectively by rapidly achieving a solution with lower cost.The messagepassing algorithm, TWA, in [25] is also tested to verify the effectiveness of the proposed method in our problem.TWA reduces the computation time compared to not applying it, but the amount is much smaller than the proposed method.Comparing Figs.7(e) and (f), they verify that parallel computing further reduces the computation time effectively since our algorithm based on ADMM is capable of parallelization.

F. Real-Time Operation Test
In the last example, shown in Fig. 8, we consider a scenario in which two UAVs track three targets, one with an initial velocity of 3m/s in the x-axis direction and the others with zero initial velocities.We applied the distributed trajectory optimization algorithm, the edge-cutting method, and the modified RHC scheme all together.It is assumed that a particle filter is used to estimate the states of the targets in the actual operating environment of the UAVs.The control horizon, T c , is set to 25 seconds considering the computation time of the algorithms.It is assumed that the sensors are attached to the left side of one UAV and to the right side of the other UAV, and that UAVs fly at different altitudes of 100m and 120m, respectively.Figs.8(a-d) show some of the simulated results under the assumption that the UAVs operate for 400 seconds.A notable point in this example is that the UAV automatically passes the tracking mission to the other UAV after about 160 seconds.Figs.8(e-f) show the state estimation results of the targets, where it can be confirmed that all targets are tracked reliably through the modified RHC scheme.

VII. CONCLUSIONS
In this paper, we investigated a distributed optimization approach to trajectory planning for a multi-target tracking problem.In order to solve the trajectory optimization and task-assignment problems simultaneously, the distributed trajectory optimization problem was formulated, and then solved by integrating a variant of a differential dynamic programming algorithm called iterative Linear-Quadratic-Gaussian (iLQG) algorithm with the distributed Alternating Direction Method of Multipliers (ADMM).In addition, we proposed an edge-cutting method to reduce the computation Soon-Seo Park and Han-Lim Choi are with the Department of Aerospace Engineering & KI for Robotics, KAIST.Youngjae Min is with School of Electrical Engineering, KAIST.Jung-Su Ha is with Machine Learning and Robotics Lab, University of Stuttgart and Max Planck Institute for Intelligent Systems.Doo-Hyun Cho is with Mechatronics R&D center, Samsung Electronics.

Fig. 1 :
Fig.1:The path is generated by a modified Dubins path algorithm, which is used to provide waypoints for the generation of the initial guess.Control inputs are generated so that UAV follows the path connecting the waypoints through the PD-controller, and these are used as the initial guess of the local trajectory optimization method.

Fig. 4 :
Fig. 4: (a)-(d) process of convergence and (e) final trajectory of the proposed non-myopic planning in a scenario with a single UAV and two stationary targets, and (f) the variances of the target estimations over the execution of the plan

Fig. 5 :
Fig. 5: Comparison of existing approaches and the proposed algorithm in a scenario with a single UAV and three targets

Fig. 6 :
Fig. 6: Comparison of two-phase approaches and the proposed algorithm for a scenario with two UAVs and 10 targets

Fig. 7 :
Fig. 7: (a)-(c) Cutting time analysis for the edge-cutting method with three different values (Each solid line corresponds to an edge and indicates that the UAV is considering the target in the trajectory optimization process), (d) cost comparison for different values and (e)-(f) comparison of the computation times (a) time = 32 sec.(b) time = 96 sec.(c) time = 160 sec.(d) time = 256 sec.(e) Position error (f) Velocity error

Fig. 8 :
Fig. 8: (a)-(d) Snapshots of the current plan (solid line) and future plan (dashed line) for each mobile sensor in the replanning process scenario with two UAVs, one moving target and two stationary targets, and (e)-(f) particle filter estimation results

1 :
Choose penalty weight ρ and step-size α 2: Generate initial guess Pj , Ūj for each target j 3: while not converged do

TABLE I :
Average uncertainty over 100 simulations for each algorithm in the single UAV scenarios.

TABLE II :
Average result over 1000 simulations for each algorithm in the single UAV scenario.

TABLE III :
Average uncertainty over 100 simulations for each algorithm in two-UAV scenarios.

TABLE IV :
Average result over 1000 simulations for each algorithm in the two-UAV scenario.Since the uncertainty has a meaning of the average variance of target estimation, we can compare the values approximately in a quadratic scale.Our algorithm shows the best performance among the tested algorithms.