Active Perception based Formation Control for Multiple Aerial Vehicles

Autonomous motion capture (mocap) systems for outdoor scenarios involving flying or mobile cameras rely on i) a robotic front-end to track and follow a human subject in real-time while he/she performs physical activities, and ii) an algorithmic back-end that estimates full body human pose and shape from the saved videos. In this paper we present a novel front-end for our aerial mocap system that consists of multiple micro aerial vehicles (MAVs) with only on-board cameras and computation. In previous work, we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple MAVs. However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can now actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking as a convex quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive it using Gaussian observation model assumptions within the CDT algorithm. We also show how we embed all the non-convex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented (video link : https://youtu.be/1qWW2zWvRhA). Extensive simulation results demonstrate the scalability and robustness of our approach. ROS-based source code is also provided.

Abstract-Autonomous motion capture (mocap) systems for outdoor scenarios involving flying or mobile cameras rely on i) a robotic front-end to track and follow a human subject in real-time while he/she performs physical activities, and ii) an algorithmic back-end that estimates full body human pose and shape from the saved videos. In this paper we present a novel front-end for our aerial mocap system that consists of multiple micro aerial vehicles (MAVs) with only on-board cameras and computation. In previous work [1], we presented an approach for cooperative detection and tracking (CDT) of a subject using multiple MAVs. However, it did not ensure optimal view-point configurations of the MAVs to minimize the uncertainty in the person's cooperatively tracked 3D position estimate. In this article we introduce an active approach for CDT. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs can now actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieve this by decoupling the goal of active tracking as a convex quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We derive it using Gaussian observation model assumptions within the CDT algorithm. We also show how we embed all the non-convex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. Multiple real robot experiments and comparisons involving 3 MAVs in several challenging scenarios are presented (video link : https://youtu.be/1qWW2zWvRhA). Extensive simulation results demonstrate the scalability and robustness of our approach. ROS-based source code is also provided.

I. INTRODUCTION
Aerial motion capture (mocap) of humans in unstructured and outdoor scenarios is a challenging research problem that has recently attracted widespread attention [1]- [5]. It facilitates human pose and shape analysis in sports medicine, high-accuracy cinematography, rescue operations in disaster scenarios and several such applications that require estimation of human 3D position or full body skeleton pose and shape in a natural environment. Our aerial mocap system consists of a team of micro aerial vehicles (MAVs), autonomously detecting, tracking and following a person. The online task is that the MAVs cooperatively and continuously estimate the 3D global position of the person and always keep him/her centered in the field of view of their onboard camera while he/she performs activities, such as, walking, running, jumping, etc., in an unstructured, outdoor rahul.tallamraju, eprice, roman.ludwig, black, aamir.ahmad @tuebingen.mpg.de, kamal@iiit.ac.in 1 Perceiving Systems Department, Max Planck Institute for Intelligent Systems, Tübingen, Germany, 2 Max Planck Institute for Biological Cybernetics, Tübingen, Germany, 3 Agents and Applied Robotics Group, IIIT Hyderabad, India. environment. The online task implies only on-board and realtime computation which is the core focus of this article. The offline task, which is not addressed in this paper, is to estimate full body skeleton pose and shape from the on-board acquired and saved image stream. In our previous work [1], we developed a marker-less deep neural network based cooperative detection and tracking (CDT) algorithm for our aerial mocap system. However, that work has two major shortcomings. First, the motion planner on each MAV, which computes the 3D way-points for the MAV to follow the person, does not consider the uncertainty in the tracked person's 3D position estimate. This implies that the MAV formations are non-optimal viewpoint configurations for minimizing jointly perceived target uncertainty. Second, the the local motion plans generated by the MAVs using a model-predictive controller (MPC) are subsequently modified using potential fields for collision avoidance. This further leads to sub-optimal trajectories.
In this work, we solve both of the aforementioned problems through a novel convex decentralized formation controller based on MPC. It actively minimizes the joint uncertainty in the tracked person's 3D position estimate while following him/her and generates safe motion plans which avoid dynamic and static obstacles. We achieve this by decoupling the joint uncertainty minimization into i) a convex quadratic objective that maintains a threshold distance to the tracked person, and ii) linear constraints that enforce angular configurations of the MAVs with respect to (w.r.t.) the person. We theoretically derive this decoupling based on Gaussian observation model assumptions within the CDT algorithm. Dynamic collision avoidance w.r.t. other MAVs, the tracked person and static obstacles is achieved also using linear constraints in the MPC. Both dynamic collision avoidance and angular configuration constraints are essentially non-convex. We preserve convexity in our MPC formulation by converting both of these constraints to external control input terms for the MPC dynamics, in addition to the default MPC control input. To determine the equivalent external control input, we use a repulsive potential field function to compute the corresponding force for each non-convex constraint, at every time-step, over the whole planning horizon, and then feed it directly to the MPC dynamics. A feasible solution to the optimization guarantees obstacle avoidance and enables active minimization of jointly estimated uncertainty in the person's tracked state. We also provide an analysis of our collision avoidance guarantees that takes into account the real-time motion plans of neighboring agents, real-time message communication delays between agents and the uncertainties in the poses of the MAVs and the position of the tracked person.
Thus, the following are the key novelties in this article.
• A convex MPC-based formation controller that actively minimizes the jointly estimated uncertainty in the tracked person's 3D position, when he/she performs activities in an unstructured and outdoor environment, • guaranteed dynamic collision avoidance within a convex optimization framework, • open-source ROS-based [6] source code of our method. We demonstrate the efficiency of our approach both in simulation and real robot experiments. Real robot experiments involving 3 MAVs in several challenging scenarios are presented, along with the comparisons to the state-of-the-art approaches. Figure 1 presents a multi-exposure image of a part of one of these experiments. Extensive simulation results demonstrate the robustness of our approach and scalability w.r.t. to the number of robots.

II. STATE-OF-THE-ART
In this section, we review state-of-the art literature in the context of multi-robot active tracking. In [7], multirobot trajectories are analytically derived and uncertainty bounds are identified for optimizing the fused estimate of target position. In [8], a non-convex optimization problem is solved analytically, to minimize the trace of the fused target covariance. However, their approach is centralized and uses Gauss-Seidel relaxation to compute optimal solutions for multiple robots. Centralized non-convex optimization [9], [10] is used to track a target in stochastic environments and locally minimize the fused estimate of the target position. In [11], active perception based formation control is addressed using a decentralized non-linear model predictive controller. However, their method is validated only for ground robots and only identifies sub-optimal control inputs owing to the non-convexity of optimization and the lack of collision avoidance guarantees. In [12], a perception aware model-predictive controller generates real-time motion plans which maximize the visibility of a desired target. The motion plans are, however, generated only for a single aerial robot. A markerbased multi-robot aerial motion capture system is presented in [5], where formation control is achieved using a decentralized non-linear model-predictive controller. Scalability (> 2) in the number of robots and formation collision avoidance behaviors are not explicitly addressed in their approach. Our previous work [1], uses convolutional neural networks for person detection and performs cooperative person detection using distributed extended Kalman filters. However, the person tracking formation controller used (in [1]) does not optimize motion plans to minimize the uncertainty in the joint estimate of target position. Furthermore, inter-agent and target collision avoidance is not guaranteed which leads to sub-optimal and unsafe formation local motion plans. Formation Configuration: In order to identify formation configurations which optimize the target observation, we refer to the analytically derived sensor geometries in [13]. These geometries provide a useful basis to leverage numerical optimization techniques for decentralized motion planning. Optimization ensures that the emergent behavior of the decentralized system of robots minimizes the uncertainty in fused target position estimate. Obstacle Avoidance: In our previous work [14], we developed a convex optimization program to generate local collision-free motion plans while tracking a movable pick and place static target using multiple aerial vehicles. This approach generates fast, feasible motion plans and has a linear computational complexity (O(n)) in the number of environmental obstacles. The work was validated only in a simulation environment and moreover obstacle avoidance was not guaranteed. In our current work, we consider real robots, stochasticity in the environment and bounds on repulsive potential fields to guarantee collision avoidance and generate safe local motion plans.

III. PROPOSED APPROACH
In this section we present our proposed approach for active perception based formation control. Sec. III-A introduces the system setup and the mathematical notations used in the paper. Sec. III-B details the modules involved in the cooperative detection and tracking algorithm used for this work. Sec. III-C derives the person observation model for each MAV and identifies multi-robot configurations which jointly minimize uncertainty in target person's state estimation. Sec. III-D uses these models and formation constraints from Sec. III-C to formulate an MPC for decentralized motion planning.

A. Preliminaries and Problem Statement
Let there be K MAVs tracking a person P. The pose (position and orientation) of the k th MAV in the world frame at time t is given by ξ k t = [( x k t ) (Θ k t ) ] ∈ R 6 . The MAVs pose uncertainty covariance matrix is denoted as Σ Σ Σ k t ∈ R 6×6 . Each MAV has an on-board, monocular, perspective camera. It is important to note that the camera is rigidly attached to its body frame. The observation measurement of the person P's centroid made by MAV k in its local spherical coordinates is given by a mean range r k t , bearing φ k t and azimuth θ k t (see Fig. 2). We assume zero correlation for measurements along these axes. The measurements are associated with a zero mean measurement noise, denoted by σ k r,t , σ k φ ,t and σ k θ ,t respectively. In local Cartesian coordinates this is denoted as y P,k t ∈ R 3 and Σ Σ Σ P,k t ∈ R 3×3 respectively. Each MAV's fused estimate of the person's centroid position and uncertainty covariance in the world frame are denoted as x P t and Σ Σ Σ P t . These estimates are computed by fusing the person observation measurements from MAV k, (y P,k t ) and the received measurements from teammates j, (y P, j t ). The MAVs operate in an environment with M known static obstacles 1 and K − 1 neighboring MAVs as dynamic obstacles. The goal of each MAV is to cooperatively track the target person using a replicated instance of the proposed formation control algorithm. This involves (a) minimizing the MAV's fused estimate Σ Σ Σ P t of the measurement uncertainty covariance and, (b) avoiding M + K − 1 environmental obstacles based on the positions of M static obstacles and the local horizon motion plans received from its K − 1 teammates. The MAVs detect a moving person using a deep neural network based markerless CDT algorithm, which was developed and presented in our previous work [1]. As part of CDT, each vehicle runs its own instance of an extended Kalman filter (EKF) for fusing detections ∈ R 3 made by a MAV and all its teammates, which results in a replicated joint state estimate of the person at each MAV. This estimate is then used to predict a region of interest (ROI) in future camera frames. The ROI guides the yaw of each MAV thereby ensuring that the target person is in the MAV's field of view. In Section III-D, we use these joint state estimates to drive our decentralized model-predictive formation controller. In [1], the focus was on developing cooperative person detection and ROI prediction algorithm and the contribution was in the green blocks. The work in this paper focuses on developing the module in blue, which utilizes information from the other modules. At every time step, the controller of MAV k 1 Our approach is agnostic to how the static obstacles are detected.

C. Observation Measurement Models and Joint Uncertainty Minimization
In this subsection, we first derive a person observation model for a single MAV in its local frame cartesian coordinates. Subsequently, we state assumptions for determining the MAV observation model for our case scenario. Finally, for multiple MAVs, we identify formation objectives and constraints which facilitate the minimization of Σ Σ Σ P t . 1) Observation Measurement Model for a MAV: To derive the person observation model for a single MAV k we drop the superscripts and subscripts for the spherical and Cartesian coordinate measurements introduced in the schematic of Fig. 2. The variance in the range measurement evolves quadratically with distance to the person r, σ 2 r = K 1 r 2 . This is because the projected size of the person in the image plane varies quadratically with distance to the person [15]. Secondly, the change in the person's projected size on the image plane is negligible with independent variations in (θ ) and (φ ). Hence, we can assume a constant variance w.r.t. these angles, i.e., σ 2 φ = K 2 and σ 2 θ = K 3 . K 1 , K 2 , K 3 are positive constants specific to the system. We now derive the MAV observation model in the local Cartesian coordinates (x = r cos φ sin θ , y = r cos φ sin θ , z = r cos θ ) as follows. (1) − (5) state equations which are useful for deriving the person observation model.
where, random variables a and b are uncorrelated. (2) lists expected values for functions of Normally distributed random variables φ , θ .
, are constants. Using (1) we derive the following.
Using the equations (1)−(5) we derive the person uncertainty covariance model in cartesian coordinates as follows. where, 2) Assumptions in Our Case Scenario: We make the following assumptions on the MAV observation model. Bearing: We employ a separate yaw controller [1] to guide the orientation of MAVs towards the target person. Due to this, we assume that the measured value φ is almost always approximately zero (φ ≈ 0). Azimuth: We assume that the azimuth angle θ remains close to π/2 (θ ≈ π/2) as observed in Fig. 2. The rationale for this assumption is that, in our approach, the MAVs always maintain a constant altitude above the person. The only change that affects the radial distance (r) to the person is the horizontal distance to him/her from the MAVs. Although the change in r is not negligible with horizontal distance, the change in K x (φ , θ ), K y (φ , θ ), K z (θ ) with θ is however small enough to assume θ ≈ π/2. Therefore the expected values of the measurements along the range, bearing and azimuth are respectively assumed to be E(r) = r, E(φ ) = 0, E(θ ) = π 2 . Using the aforementioned assumptions for our case scenario, the measurement noise for a single MAV is as follows.

3) Minimizing Fused Uncertainty for MAV Formations:
The minimum uncertainty in the person's fused state estimate is achieved when the following conditions are met.
1) The angles between the MAVs about the person's estimated position is 2π K . Given measurements from observations of different robots (or sensors), [13] analytically derives optimal sensor geometries for unbiased and efficient passive target localization. Geometries which maximize the determinant of Fisher information matrix are identified. For K(> 3) independent measurements, the angle between sensors resulting in minimum fused uncertainty is either 2π K or π K . We validate this analysis by computing the merged covariance, Σ Σ Σ P t , of individual MAV measurement uncertainties, Σ Σ Σ P,k t using the recursive covariance merging equations from [16]. The angular configurations which minimize trace of Σ Σ Σ P t are highlighted in yellow boxes of Fig. 4c. The angular configurations which minimize the merged covariance are as given in Fig. 4a (this also corresponds to the red-encircled box in Fig. 4c). Fig. 4b illustrates some other configurations. Out of these, Fig. 4b 1 and 2 are examples of non-optimal configurations. Fig. 4b 3 is an example of several other minimum covariance configurations. However, in this work the angular configuration of Fig. 4a is used. This is because it allows the tracked person to be viewed uniformly from different directions. Also, it reduces the possibility of losing the target person to environmental occlusions. Moreover, as our system is motivated by the application of outdoor motion capture, observing an asymmetric target (at least in one axis) from different directions increases visual coverage.
2) The measurement uncertainty for each individual MAV is minimized.
The previous angular constraint condition minimizes the fused uncertainty for any given set of measurements from the MAVs. However, the fused covariance remains a function of each individual MAV's measurement uncertainty, Σ Σ Σ P,k t ∀k ∈ [1 · · · K]. Σ Σ Σ P,k t is a function of the relative position of MAV k w.r.t. the tracked person. Notice that each MAV's position remains controllable without changing the angular configuration of the whole formation. Therefore, the fused uncertainty is completely minimized by minimizing the trace of Σ Σ Σ P,k t for each MAV k. How this is done is discussed in the following subsection.

D. Decentralized Quadratic Model Predictive Control
The trace of (8) is as shown below.
Minimizing (9) ensures that each MAV improves its person observation measurement uncertainty. However, r ≈ 0 would lead the MAVs to collide with the target person and other MAVs. In order to ensure safety of the target person, we limit the MAV to reach a desired circular safety surface of radius d des , which is at the desired height h des from the target person. Rewriting (9) in terms of its position (x k t ) and a desired position (x k t ) chosen on the safety surface at r des = d 2 des + h 2 des we obtain the following equations.
The desired positionx k t is chosen along the MAVs current orientation to the target person. We define the objective of our MPC running on each MAV k as follows. Minimize, • the distance to the desired target surface (safety surface) around the tracked person, • the difference in velocity between the MAV and the fused velocity estimate of the target person and, • MAV control effort. Consequently, the optimization objective for active tracking using MPC is as given below.
where X k order to ensure continuous and smooth person tracking, we minimize the difference in velocity of the MAV (ẋ k t ) and the fused velocity estimate of the person (ẋ P t ) available from the EKF. W K = diag(K, K, K) is experimentally determined. W E , WẊ ∈ R 3×3 are custom-defined positive definite weight matrices. Furthermore, the active tracking MPC is subjected to the following constraints.
• A non-convex constraint of maintaining a desired angle difference of 2π K w.r.t other MAVs about fused estimate of the person's position x P t , • A non-convex constraint to maintain at least a distance d min from other obstacles. • Regional and control saturation bounds for MAV states and control inputs for a horizon. The non-convex constraints are embedded into the MPC dynamics as repulsive potential field forces f k t [14]. The computation of these forces is discussed in Sec. III-E. The convex MPC is thus defined by the following equations.
The 3D translation motion of MAV k is governed partly using accelerations u k t (n) =ẍ x x k t (n) and partly by an external control input f k t (n), where n is the horizon time step. The state-space translational dynamics of the robot is given by (13). Dynamics (A ∈ R 6×6 ) and control transfer (B ∈ R 6×3 ) matrices are given by, where, ∆t is the sampling time. f k t (n) is a real-time computed external control input representing non-convex constraints of formation configuration and obstacle avoidance. The following section details the computation of these forces.

E. Computation of External
Control Input (f k t (n)) A constraint for enforcing angular configurations w.r.t other MAVs involves a computation of the angle of each neighboring MAV w.r.t the target person using an inverse tangent function. This is a non-convex operation in R 3 . Moreover, avoiding collisions by maintaining a minimum distance w.r.t (a) other MAVs, (b) obstacles in the environment or (c) the tracked person, constraint the robot to operate in non-convex regions. In our previous work [14], we introduce the concept of converting non-convex constraints into repulsive potential field force values f k t (n). These exact values are computed at every time step and fed into the MPC dynamics to enforce the non-convex dependencies and ensure quick convergence to an optimal solution. In this work, we utilize these potential field values to enforce formation angular configuration and collision avoidance. The repulsive potential field function used in this work is referred to as cotangential repulsive force function. The cotangential repulsive force acting on MAV k due to a non-convex constraint with respect to an entity j (teammate, obstacle or target person) is as given below.
where, z = π 2 d−d min d max −d min and d is a chosen distance metric (absolute angular difference or euclidean distance) between MAV k and entity j. (17) varies hyperbolically with distance d .The potential field introduces two threshold radii d max and d min . d max defines the region of influence of the potential field. d min is the distance below which the value of the potential field tends to infinity. Practically this is clamped to a large positive value F max which is derived to guarantee obstacle avoidance (see Sec. III-E.3).

1) Active Tracking Input:
To satisfy the inter-MAV angular configuration constraints, we compute (i) the angle of robot k w.r.t. the tracked person 2 (φ k t (n)) , (ii) the angle of neighboring MAV j w.r.t. the the tracked person (φ j t (n)), (iii) the distance metric for (17) as absolute angular difference d act (n) = |φ k t (n) − (φ j t (n))| at each horizon time step n. We then apply a cotangential force along the normal direction (in the plane of MAV motion) to the MAVs approach towards the target person. d act (n), ∀n > 0 is computed using MAV k's horizon motion plan and the teammate j's communicated horizon motion plan. The total active tracking force per teammate j is computed as follows.
The factor of |r des − x k t (n) − x P t 2 | in (18) helps avoid field local minima. This is because, if two robots have similar angles of approach, the MAV farther away from the desired distance (r des ) is repelled with a higher force than the MAV near desired distance 3 . c is a small positive constant which ensures that the force magnitude is non-zero at the target surface, if the desired angular difference is not yet achieved. The force F k, j act (n) acts in a direction α ⊥ which is normal to the direction of approach to the target person α = x k There are two possible direction choices, namely, ±α ⊥ in the plane of approach towards the target person. The direction pointing away from neighboring MAV j is chosen to ensure a natural deadlock resolution for robots having similar angles of approach to the target person (see horizon motion plans in Fig. 5). The total active tracking external control input of MAV k is then computed as follows. 2 We overload the notation φ k t to now represent the angle of the MAV k in the world frame w.r.t x P t till the end of this paper. 3 Refer [14] for a detailed explanation on how these forces avoid field local-minima problems associated with potential field based planners Notice that the non-convex constraint requiring the inequality of two angles and maintaining a desired inter-MAV angular difference is converted into an external control input computed at every iteration of MPC (used in (13)) which preserves the convexity of the optimization. For K MAVs, the value of d max = ( 2π K ) for the active tracking force. 2) Obstacle Avoidance Input: We use (17) to compute external control inputs for obstacle avoidance, the formulation of which is discussed below. Dynamic Obstacle Avoidance: The K − 1 teammates of a MAV are dynamic obstacles for the k th MAV. The euclidean distance d dyn (n) = x k t (n) − x j t (n) 2 , ∀n > 0 ,between MAV k and j is computed using MAV k's horizon motion plan and teammate j's communicated horizon motion plan. The external control input acts along the direction β = which is a unit vector pointing in the direction away from the teammate MAV j's horizon motion plan. The total dynamic obstacle avoidance external input f k dyn (n) is as follows.
The choice for d max and d min for dynamic obstacle avoidance are discussed in the following section III-E.3. Static Obstacle Avoidance: We consider M static obstacles in the environmet. To compute the external input w.r.t an obstacle m ∈ [1, M], the euclidean distance d sta (n) between MAV k's horizon motion plan x k t (n) and the location of the static obstacle x m is computed. The total static obstacle avoidance force is as given below.
Further, multiple static obstacles placed closed together might cause the MAV k to get stuck in a field local minima (as shown in [14]). To avoid these scenarios, we compute a force f k ang which penalizes MAV approach angles w.r.t target person which have static obstacles in the line approach. The computation of this force is similar to the active tracking force described in the previous section. Target Person Obstacle Avoidance: The robots must also maintain a safety distance (r des ) w.r.t. target person. The objective of the model-predictive controller guides the MAVs to this safety distance but does not guarantee the safety of the person. To guarantee collision avoidance we generate a repulsive potential field around the target person. We treat the person as another obstacle, with an external input computation similar to (21). The MAVs are now repelled away from the tracked person P by a force of f k target (n). The total external control input for obstacle avoidance f k obs (n) is a summation of all the aforementioned obstacle avoiding potential field forces as given below.
Subsequently, the total external control input f k t acting on MAV k is determined.
The following subsection details considerations for choosing d min , d max and F max to guarantee obstacle avoidance in stochasic environments.
3) Obstacle Avoidance Guarantee: Fig. 5, details a typical obstacle avoidance maneuver of two MAVs running the model-predictive control. To ensure stable repulsive force and guarantee obstacle avoidance, we take into account the following important considerations.
• maximum tracking error (e max ) of low-level controller w.r.t generated way-point (x k t ) for one time step ∆t, • maximum velocity (ẋ max ) of the robot, • self-localization uncertainty of the MAV (Σ Σ Σ k t ), • localization uncertainty of neighboring MAVs (Σ Σ Σ j t ), • communication or packet loss with neighboring MAVs. • maximum cotangential force magnitude F max ≥ u max To guarantee obstacle avoidance the d min and d max values of the potential field are defined as follows, where, e max is the empirically determined maximum tracking error in time ∆t along the direction of the way-point (x k t (1)), (ẋ max )∆t is the maximum translational motion of the robot in any direction. r k σ t is the length of maximum eigen value of (Σ Σ Σ k t ), r j σ is an experimentally determined maximum possible localization uncertainty (max(Σ Σ Σ j t )) for a neighboring MAV j. Wireless communication delays are unavoidable in real-time implementations and cannot be ignored as they affect the MAVs estimate of neighbor positions. The value of r k, j comm,t increases in proportion to the communication delay between two MAVs k, j. For a delay of ∆t seconds, r k, j comm,t increases by a distance (ẋ max )∆t meters which is the maximum a neighboring MAV can travel in ∆t seconds. Finally the maximum value of the cotangential force F max should be at least equal to or greater than the maximum control input u max . Therefore, the vector sum u max + f k t (n) has a magnitude greater than or equal to zero and acts along a direction which is away from the obstacle.

F. Algorithm for Active Perception based Formation Control
Algorithm 1 Active Tracking Convex MPC for MAV k with inputs {x P

IV. EXPERIMENTS AND RESULTS
A. Real Robot Experiments 1) Setup: We use three self-developed and customized octocopter MAVs in the field experiments. Each MAV is capable of running the CDT algorithm (Fig. III-B) onboard and in real-time, including the deep neural network. The MAVs are equipped with an Intel i7 CPU, an NVidia Jetson TX1 GPU and Open Pilot Revolution flight controller system on chip. The MAVs use on-board GPS and IMU data, logged at 100Hz, for localization. We place Emlid Reach differential GPS receiver on each MAV and on either shoulder of the tracked person to acquire ground truth location. The data from the differential GPS is not used during real-time experiments and stored for comparison only. For a decentralized implementation, a ROS multi-master setup with communication over Wifi is used. Each MAV continuously captures and stores camera images acquired at 40Hz. The on-board GPU runs SSD-Multibox [17]. The model-predictive controller is evaluated at 100 Hz using the CVXGEN convex optimization library. The used horizon  3,11]. The distance to the target in all experiments is set to d des = 8m, the desired height h des = 8m above the joint target estimate's centroid. We observe that on average the tracking error w.r.t to the MPC generated way-point is around ε max = 1m, therefore using (24), d min = 1.5. d max w.r.t neighboring MAVs varies with change in self localization uncertainty and communication losses. The task for each MAV in all experiments is to track the person using the CDT pipeline while using the proposed decentralized active perception based target tracking algorithm. We conducted 5 real-MAV tracking experiments, each with a 3 MAV formation as follows. After the experiments, the error in the tracked position estimate is calculated as the 3D Euclidean distance between the estimated and corresponding ground truth provided by the differential GPS system. Error for MAV pose is calculated similarly.
2) Analysis of results: Fig. 8 compares ground truth and tracked trajectory with estimated uncertainty during tracking experiment using our proposed approach in Exp. RE 3. We analyzed person tracking accuracy, uncertainty, and MAV self pose accuracy for all experiments in Fig. 7(ab). Exp. RE 3 achieve significantly more accurate person track estimates (mean error of 0.7m) than using state of the art method in Exp. RE 1 (mean error of 1.2m), despite the worse self pose estimates of the MAVs in Exp. RE 3 (the self-localization errors were due to the high errors in the MAV's GPS localization and may vary arbitrarily over different experiments). This showcases the ability of our approach to attain high tracking accuracy despite bad GPS reception.
Even though the state of the art approach in Exp. RE 2 achieves similar accuracy to that of our approach in Exp. RE 3, we see that our approach outperforms the former in keeping the person always in the camera image and close to the image center as shown by bar and box plots in Fig. 7(c-d). Moreover, our approach achieves least tracking uncertainty in the tracked person estimate as compared to all the other state of the art methods.
The presence of obstacles that the MAVs need to navigate around affects the target tracking and self pose accuracy in Exps. RE 4 and RE 5. This is mainly due to the additional maneuvering overhead to avoid obstacles. However, the ability to keep the person close to the center of the image is only slightly affected.   keep the tracked person centered in each MAV's camera view. Our approach in Exp. RE 3 not only reduces the average distance between the tracked person and each camera's image center but also ensures the person is completely covered by the camera image in 97.5% of camera frames. This is a crucial property if the formation is used for aerial scanning of motion tracking.
B. Simulation Experiments 1) Setup: The proposed algorithm was simulated in the Gazebo simulation environment [18]. Identical algorithm codebase is used for experimentation on both real and simulated MAVs. The simulations were conducted on a standalone Intel i7-3970X CPU with an NVIDIA GTX 1080 GPU. The environment consists of multiple AscTec Hexarotor Firefly MAVs confined to operate in a world with dimensions 20m × 20m × 20m. Each MAV is equipped with a rigidly attached Asus Xtion camera with their parameters set to the real MAV camera parameters (resolution and focal length). To be as close as possible to real scenarios, we simulate GPS and IMU drift by imposing random-walk offset on the ground truth position. We simulate a human actor model and guide it along a pre-defined trajectory with random motions. The actor traverses on a randomly varying terrain interspersed with emulated trees. The trees create obstacles and occlusions for the hovering MAVs and make it challenging to track the person effectively.
2) Comparison of methods: We conducted 10 simulation runs for the following methods with a 5 MAV formation.
• SE 1: Approach of Price et al. [1]   • SE 2: DQMPC (Tallamraju et al.) [14] • SE 3: Our approach • SE 4: Our approach with all MAVs avoiding emulated virtual obstacles. Each run was 120 seconds long. Figure 9 shows the statistics of this set of experiments. We clearly observe that our proposed approach with or without static obstacle avoidance outperforms both the state of the art approaches. Moreover, when static obstacles are not present, our approach is able to keep the person fully in the camera image frame almost 100% of the experiment duration. Also, it significantly outperforms all other methods in minimizing the joint uncertainty in the target position estimate.
3) Scalability Experiments: Fig. 10(a) shows the result of target tracking with our method for an increasing number of robots. We conduct 10 simulation trials for each number of robot configuration. Real-time factor for all runs in this experiment was set to 0.1. We validate the scalability of our model predictive controller by running these experiments in an environment with several static and known obstacles (in the form of trees) and perfect communication for a formation of 3 to 16 MAVs. We observe nearly linear improvement in tracking error with a higher number of robots. At the same time, we notice that computational requirements did not affect real-time performance for up to 16 robots. 4) Experiments with Communication Failure: In Fig. 10(b), the effect of inter-robot communication failure on tracking is demonstrated for our approach through experiments with 8 MAVs and simulated static obstacles. Communication losses varying from 10% to 100% is simulated. The results were averaged over 3 trials per communication loss percentage. It can be observed that the tracking gets progressively worse with a higher percentage of communication loss. At 100% communication loss, each robot relies only on its detection and does not cooperatively improve the target position estimate. Nevertheless, we observe that our approach is able to maintain an accurate target state estimate for up to 25% communication loss.

V. CONCLUSIONS AND FUTURE WORK
In this article we introduced a novel robotic front-end for an aerial mocap system consisting of multiple MAVs. We proposed a decentralized convex MPC-based algorithm for the MAVs to actively track and follow a moving person in outdoor environments and in presence of static and dynamic obstacles. In contrast to cooperatively tracking only the 3D positions of the person, the MAVs now actively compute optimal local motion plans, resulting in optimal view-point configurations, which minimize the uncertainty in the tracked estimate. We achieved this by decoupling the goal of active tracking as a convex quadratic objective and non-convex constraints corresponding to angular configurations of the MAVs w.r.t. the person. We showed how this is derived using Gaussian observation model assumptions within the cooperative detection and tracking module. We also showed how we embed all the non-convex constraints, including those for dynamic and static obstacle avoidance, as external control inputs in the MPC dynamics. We evaluated our approach through rigorous and multiple real robot field experiments in an outdoor scenario. It involved 3 self-built 8-rotor MAVs. These experiments validated our approach and showed that it significantly improved accuracy over the previous methods. In simulations we showed results with up to 16 MAVs. These demonstrated scalability of our method as well as its robustness to communication failures. Future work includes extending our method to full body human pose detection and offline reconstruction of human pose and shape.