Dynamic Task Priority Planning for Null-Space Behavioral Control of Multi-Agent Systems

The task priority planning problem is addressed in the task supervisor of null-space behavioral (NSB) control for multi-agent systems. Traditional methods rely on pre-defined logic-based or fuzzy rules to adjust task priority. In this work, a novel task supervisor is proposed using model predictive control (MPC) techniques. At each sampling instant, the task priority planning problem is formulated as a switching mode optimal control problem (OCP), which can be solved by efficient mixed-integer optimal control algorithms. The optimal task priority order is obtained based on current and predictive information of agents, without the need for a pre-defined rule. By explicitly introducing slack variables into constraints, the proposed MPC method is flexible to cope with dynamic obstacles in unknown environments. Simulations with static and dynamic obstacles show that the proposed method can provide significantly better control performance than the traditional logic-based method using less priority switchings.


I. INTRODUCTION
Because of their agility and versatility, multi-agent systems are widely applied in both military [1], [2] and civilian areas [3]- [5]. Theoretic research on multi-agent systems have also been widely addressed [6], [7]. Autonomous mobile robots are typical multi-agent systems which usually work in an unstructured environment and need to accomplish multiple tasks such as moving through predefined via-points, avoiding static or dynamic obstacles, forming a formation and flocking. However, these tasks may conflict, e.g. an autonomous mobile robot cannot move through a predefined via-point while at the same time avoid an obstacle near the via-point.
The behavioral control, first proposed by Brooks [8], is one popular control method to handle task conflicts and has been studied intensively in the past decade. The traditional methodology of the behavioral control mainly includes the layered control approach [8] and the motor schema control approach [9], [10]. The layered control approach is a competitive approach in which tasks with a lower priority can only be executed after those with a higher priority have been completed. The motor schema control is a cooperative The associate editor coordinating the review of this manuscript and approving it for publication was Jianxiang Xi . approach in which the output of each task is summed up with weights to generate the final task output. In this way, no task is completely achieved. More recently, Antonelli and Chiaverini [11] has proposed a cooperative approach named null-space-based behavioral (NSB) control to solve task conflicts by assigning a priority to each task and designing a priority-based task coordination scheme. The NSB method guarantees that tasks with a higher priority can be executed completely on one hand, and on the other hand tasks with a lower priority are partially executed via null-space projection. Therefore, the NSB method is potential in nature in handling task conflicts. However, the priority of tasks, which is determined by the task supervisor, is usually set in advance and is fixed during the entire task execution process [12], [13]. This may degrade the control performance of the NSB method and limit its application in dynamic and unknown environments. For instance, setting the obstacle-avoidance task as the top priority when obstacles are not present would affect the performance of other tasks.
Some efforts have been made to tackle this problem. Finite state automata (FSA) is a state transfer mechanism in which system states automatically transfer from one to another based on system current state and state transition trigger conditions. In the context of NSB control, the planning of VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ task priority is realized by designing the rules for state transition trigger conditions [14]- [17]. Alternatively, task priority planning has been studied by designing certain fuzzy rules using the fuzzy method for autonomous systems [15], [18], [19]. However, all aforementioned methods require human to design the planning rules for task priority planning in advance. This is difficult when the number of tasks is large and the environments are dynamic and unknown. In addition, the planning of tasks priority, either using FSA or the fuzzy method, only makes use of the current state of the agents without future predictions. It should be noted that advanced priority planning or behavior switching algorithms have been studied in the field of robots and multi-agent systems but not in the NSB control framework. In [20], a model predictive control (MPC) algorithm has been proposed to compute the switching time of robot behaviors, given a known switching order in advance. In [21] and [22], genetic algorithms have been exploited for robot task switching and behavior selection, respectively.
In this paper, a novel task supervisor to dynamically plan task priority in the NSB control is proposed using MPC techniques. MPC as an advanced controller, has been employed in robot and multi-agent systems mainly for trajectory tracking and obstacle avoidance [23], [24]. Here, MPC works as a high-level planner to provide dynamically the priority of tasks. First, different task priority orders formulate multiple composite tasks. Then, at each sampling instant, a switching mode optimal control problem (OCP) is formulated by explicitly taking into account system dynamics and constraints. The optimal composite task trajectory is obtained by solving the OCP making use of current state and future predictions. As a result, task priority planning can be realized on-line without a pre-defined adjustment rule. Finally, a simulation with static and dynamic obstacles is shown to demonstrate the effectiveness of the proposed method. It can be found that the predictive nature of the proposed method leads to earlier and smoother switching compared to the traditional logic-based method.
This paper is organized as follows. Section II briefly introduces preliminaries of the NSB control and the basic problem description. In Section III, the MPC based task priority planner, which is the main contribution of the paper, is presented. Section IV presents in detail a simulation in two scenarios. Finally, a conclusion is drawn.

II. PRELIMINARIES AND PROBLEM DESCRIPTION A. ELEMENTARY TASK
In the NSB control, elementary tasks can be organized in different orders. Then, tasks fusion is designed to obtain different composite tasks with a priority order. Note that the term ''task'' and ''behavior'' have the same meaning in this paper, because the ''behavior'' is also named task or mission in the behavioral control [13]. An elementary task is encoded by a task variable ρ ∈ R m , which is a function of the system configuration x ∈ R n , expressed as with the corresponding derivative: where J (x) ∈ R m×n is a configuration-dependent task Jacobian matrix and v ∈ R n is the stacked vector of the agents' velocities. By inverting the locally linear mapping (2) into thee least square formulation (LSS), the reference velocity v d can be computed by: where ρ d is the reference position, J † = J T (JJ T ) −1 ∈ R n×m is the pseudo-inverse of Jacobian matrix. In practice, the integration of the reference velocity would result in a numerical drift of the reconstructed agent's position. The following closed-loop inverse kinematics (CLIK) form has been used to compensate such drift: where is a suitable constant positive-definite matrix of gains andρ = ρ d − ρ is the task error.

B. COMPOSITE TASK
A composite task is a combination of multiple elementary tasks arranged in a prioritized order. Let . . , h}, and m b is the dimension of the b-th task space. Define a time-dependent priority function y(b, t) : N b × [0, ∞] → N b that maps the task function space to a priority space. Also define a task hierarchical structure which complies with the following rules [25]: 2) The mappings from the velocities to the task velocities are captured by the task Jacobian matrix 3) The dimension of task b with the lowest priority may be larger than n − r|c =b c=1 m c so that the dimension n is ensured to be greater than the total dimension of all tasks. 4) The value of y(b, t) is assigned by the task supervisor on the basis of the mission's need and sensor information. By assigning elementary tasks with a given priority, the composite task velocity at time t can be expressed as

C. TASK SUPERVISOR
A task supervisor is in charge of switching between the defined composite tasks based on the current state of agents. Assume that there are M composite tasks which are composed of the same elementary tasks arranged in different prioritized orders. Let v db , b = 1, . . . , M represent the velocity of the b−th composite task. The principle block diagram of a conventional task supervisor is shown in Fig. 1, in which R bc , b = 1, . . . , M , c = 1, . . . , M , b = c represent the task switching trigger conditions designed by human based on the state of agents. Once the task switching trigger conditions are satisfied, the composite task is switched to another, resulting in adjustment of task priority. An example of the traditional task supervisor is to achieve priority switching between the motion task and the obstacle-avoidance task, in which the task switching trigger conditions are designed based on logic rules, i.e. if the distance between an agent and an obstacle is greater than or equal to a safe distance, the obstacle-avoidance task has a higher priority, otherwise the motion task has a higher priority.

III. DYNAMIC TASK PRIORITY PLANNING
The traditional NSB method usually has a pre-defined and fixed priority order during the entire task execution process, i.e. y is a time-independent function. In this paper, a task priority planning method is proposed using model predictive control techniques.

A. OPTIMAL CONTROL FORMULATION
Consider a group of n (n ≥ 2) agents, and a set of h(h ≥ 2) tasks for each agent. Define the state vector x as x(t) = [x 1 (t) , . . . , x n (t) ] , each element of which represents the state of an agent. Let us introduce a binary mode vector The dynamics of an agent can be written as a convex combination of dynamics in different modes: where w j i is the jth element of w i and v j d,i is the jth task velocity in the form of (5) of the ith agent. The total number of task velocities M is computed by M = h!, the factorial of the number of tasks h. This is because h tasks can generate h! composite tasks with elementary task in different prioritized orders. An important constraint of w is which is called the special-ordered-set-of-type-one (SOS1) [26], ensuring at least but only one active mode at any time instant.
Define the tracking error for the ith agent as where ρ i d is the reference position for the ith agent. Define a cost function (10) which consists of combined tracking errors from all agents plus slack variables u i , i = 1, . . . , n, each weighted by positive parameters K i and P i . As a result, an OCP can be formulated as wherex 0 is the initial sate, x i O (t) is the obstacle position detected by the ith agent at time t, d i is the safety distance lower bound for the ith agent, and T is the prediction horizon. The slack vector is u(t) = [u 1 (t) , . . . , u n (t) ] . Constraint (11d) ensures that all slack variables are non-negative, making (11c) a soft constraint in case of local minimum problems.
Remark 1: In case of dynamic obstacles, agents may repeatedly violate the safe distance constraint in multiple sampling instants even though they have already set the obstacle-avoidance task as the top priority one. This case is more complicated than the traditional local minimum problem where agents stop moving because of two opposite velocities from different tasks. For the traditional method, additional algorithms, such as adding small measurement noises [25] and introducing human intervention [27], are required to handle the local minimum problem. However, the proposed MPC method in the form of (11) can handle dynamic obstacles and local minimum problems explicitly and systematically by incorporating constraint (11c) and (11d). In this way, agents can temporarily violate the safety distance constraint in an optimal way depending on current situation and return to other behaviors when possible.

B. REAL-TIME MODEL PREDICTIVE CONTROL ALGORITHM
During the task execution process, problem (11) must be solved repeatedly with different initial statesx 0 updated at every sampling instant, leading to a model predictive control (MPC) framework. However, problem (11) contains integer variables which is difficult to solve. In this paper, we employ a real-time MPC algorithm [26] that can efficiently solve (11) on-line, providing a dynamic task priority planning. The algorithm employs a direct method using ''first discretize then optimize'' methodology, and an outer-convexfication with integer relaxation technique.
Divide the domain [0, T ] into N intervals, characterized by equidistant grid points In each interval, the binary variables w are assumed to be constant hence they can only change values at grid points. The binary variables w are relaxed to real variables satisfyingŵ(t) ∈ [0, 1] nM . Using multiple shooting [28], problem (11) can be converted to a nonlinear programming problem (NLP), written as and s is the current sample, x s|k is the k−step predicted state at sample s, x i,s|k is ith element of x s|k given by (7) and u i,s|k is ith slack variable at sample s. An illustration of this discretization at the initial guess is shown in Fig. 2. In (12), the function φ : R nn x × [0, 1] nM is obtained by solving the nonlinear dynamics (7) via numerical integration, e.g. the Euler method. Constraint (12c) ensures that the state trajectory is continuous over the entire prediction horizon at the solution of (12) after convergence is achieved. An illustration of the state and mode trajectory at the solution is shown in Fig. 3. The NLP (12) can be solved efficiently by standard NLP solvers using sequential quadratic programming or interior point methods, without using integer optimization algorithms. After solving (12), a sum-up-rounding (SUR) step can be employed to obtain the binary variable W fromŴ [26]. The SUR step reads as and w i (t) satisfies M j=1 w j i (t) = 1. Proposition 1 is a given as Theorem 5 in [29], which means that the approximation error of the SUR step can be made arbitrary small by choosing a sufficiently small grid length t. In practice, the upper bound of (14) is often conservative, leading to even smaller approximation error. In terms of computational cost, the SUR step can be neglected when compared to solving (12).
In MPC, only the first element of the solution is fed back to the system. At next sampling instant, the NLP (12) and the SUR (13) are executed again using the latest state measurement. The outline of the MPC algorithm is given in Algorithm 1.
Remark 2: The proposed MPC by design is a centralized planner. However, since each agent is independent, dynamics (7) is usually separable. As a result, problem (11) can be split into n smaller problems for each agent if no global tasks are present, e.g. global formation. Solve problem (12) and obtainŵ s|k for k = 0, . . . , N − 1.

IV. SIMULATION
In this section, simulations are conducted where three robots perform the motion task and the obstacle avoidance task on the ground. Comparisons are made between the proposed MPC task priority planner and the traditional logic-based planner with both static and dynamic obstacles.

A. TASK DESIGN
We consider two elementary tasks, namely the motion task and the obstacle-avoidance task. They can form two composite tasks in different priority orders. Other elementary tasks, such as centroid and rigid formation [12] can also be considered but omitted here for sake of simplicity.

1) MOTION TASK
In the motion task, robots are driven toward a target point along a pre-determined reference trajectory. The corresponding task function can be defined as where p = [p x , p y ] is the robot position coordinate. The desired motion task velocity can be calculated as where J B = I 2 is the Jacobian matrix of motion task and I 2 is the identity matrix; J † B is the pseudo-inverse of J B . Since J B is symmetric and idempotent, we have J † B = J B ; ρ B,d is the desired task function; B is a suitable constant positive-definite matrix of gains;ρ B = ρ B,d − ρ B is the motion task error.

2) OBSTACLE-AVOIDANCE TASK
In the obstacle-avoidance task, robots must avoid obstacles detected by their sensors along the reference trajectory. The corresponding task function can be defined as where p O = [p xO , p yO ] is the obstacle position coordinate. The desired obstacle-avoidance task velocity can be calculated as where   the obstacle-avoidance task;ρ A = d − p − p O is the motion task error; d is the safe distance.

3) COMPOSITION TASK
In the first composition task, the obstacle-avoidance task has higher priority, leading to the task velocity as where I −J † A J A is the null space of the obstacle avoidance task. In the second composition task, the motion task has higher VOLUME 8, 2020  Active mode of the robots during the task execution process when static obstacles are present. Composite task A: obstacle-avoidance task>motion task; Composite task B: motion task>obstacle-avoidance task. priority, leading to the task velocity as where I − J † B J B is the null space of the motion task.

B. SIMULATION CONFIGURATION
The objective of the motion task for the robots is to move from the initial points to the target points. The pre-defined   . Active mode of the robots during the task execution process when dynamic obstacles are present. Composite task A: obstacle-avoidance task>motion task; Composite task B: motion task>obstacle-avoidance task.
reference trajectories of the three robots are given by Parameters used in the simulation are shown in Table. 1. The proposed MPC algorithm is compared against a traditional task supervisor based on trivial switching logic, i.e. if the distance between the robot and the closest obstacle is smaller than the safe distance, the robot performs the composite task A, otherwise the composite task B is performed. Both methods are implemented using the same parameters in Matlab R2020a on a laptop running Windows 10 with AMD R5 4600H CPU at 3.00 Ghz and 16GB memory. The NLP (12) is solved by Ipopt [30] using Casadi toolbox [31]. In all our simulations, the MPC algorithm has an average runtime of 38 ms and a maximum of 49 ms. On the contrary, the logic-based method has a negligible execution time for just implementing a if-else based piece of code. This has demonstrated that the MPC algorithm is able to run in real-time even though in a trivial Matlab implementation.
In next sub-sections, we show the proposed MPC algorithm significantly outperforms the logic-based one at the cost of such increased on-line computational burden.

1) STATIC OBSTACLES
In the first scenario, static obstacles are present at positions given in Table 1. The trajectories of the proposed MPC and the logic-based method are shown in Fig. 4. The distances between the robots and the obstacles are shown in Fig. 5. The tracking error is shown in Fig. 6.
It can be seen that the proposed MPC method significantly outperforms the logic-based method using the same sampling frequency. The trajectory of composite tasks is shown VOLUME 8, 2020 in Fig. 7. The logic-based method switches frequently to avoid obstacles, leading to oscillations of trajectories and violations of the safe distance when encountering obstacles. On the other hand, the proposed MPC method smoothly finishes tasks using only two priority switchings, without violations of the safe distance constraint. This can be explained by the fact that the MPC method is able to predict obstacles in advance hence to prepare priority switching earlier (e.g. see Fig. 7, the MPC method always switches to composite task A earlier than the logic-based method). This demonstrates the effectiveness of the proposed MPC algorithm and its advantages over the traditional logic-based method.

2) DYNAMIC OBSTACLES
In the second scenario, we introduce a dynamic obstacle in the path of robot 1, defined by (p xO , p yO ) = (10, 2.14t)m, (24) meaning that the obstacle is moving in the y-direction at p x = 10m with a speed of 2.14m/s. At each sampling instant, the proposed MPC method detects the position of the dynamic obstacle and makes predictions using this information without knowing the future position of the dynamic obstacle. The trajectories of the proposed MPC and the logic-based method are shown in Fig. 8. The distances between the robots and the obstacles are shown in Fig. 9. The trajectory of task priority is shown in Fig. 11. The tracking error is shown in Fig. 10. It can be observed that the logic-based method switches frequently but fails to avoid the dynamic obstacle, leading to significant violations of the safety distance constraint. The proposed MPC method only temporarily and slightly violates the safety distance constraint at around t = 5.2s, but quickly run away from the obstacle thereafter (see Fig. 9). This is compatible in mode switching behavior as shown in Fig. 11, where the MPC planner switches to composite task A after t = 4.2s but switches to composite task B at around t = 5.1s when it founds the obstacle is also moving in its direction. The MPC planner then switches back to composite A immediately to escape from the obstacle. This switching behavior is due to the prediction capability of MPC to avoid obstacles in advance. The slight violation of constraint is due to the introduction of slack variables in (11) to avoid frequent oscillations and switchings. To further understand how robot 1 avoids the dynamic obstacle, Fig. 12 shows at different time points the trajectories of robot 1 using both methods. Robot 1 has detected the dynamic obstacle at around t = 4.3s using both methods. The proposed MPC planner immediately switches from composite task B to A while the logic-based method acts slower. At t = 4.5s, the MPC planner has decided to bypass the obstacle from its top (positive y direction). Then at t = 4.7s, the MPC planner has predicted that the safety distance constraint would be violated and re-planned the trajectory to bypass the obstacle from its bottom (negative y direction). On the other hand, the logic-based planner is too close to the obstacle at many time instants, constantly switching from two composite tasks. This example has demonstrated the prediction capability and flexibility of the proposed MPC planner in unknown and dynamic environments.

V. CONCLUSION
In the NSB control for multi-agent systems, traditional methods rely on pre-defined logic-based or fuzzy rules to dynamically plan the task priority to obtain a composite task. This paper has proposed a novel MPC-based task supervisor to plan the task priority using current and predictive information. At each sampling instant, the task priority planning problem has been formulated as a switching mode OCP, which can be solved by an efficient mixed-integer optimal control algorithm. Simulation results with static and dynamic obstacles have demonstrated the effectiveness of the proposed method in dynamic and unknown environments and its advantages over the traditional logic-based method.