HDP Algorithms for Trajectory Tracking and Formation Control of Multi-Agent Systems

A heuristic dynamic programming (HDP) algorithm for trajectory tracking and formation control of multi-agent systems (MAS) is presented in this paper. The selected HDP method allows for an online optimal control design. The multi-agent control problem is formulated as a leader-follower, where it is necessary for followers to maintain the assigned formation, while the leader follows the specified trajectory. Developed QR-Solver and RLSμ-QR-HDP-DLQR algorithms provide a new methodology to obtain the solution of the Hamilton-Jacobi-Bellman (HJB) equation. These proposed solutions are based on QR factorization to decrease the computational cost and avoid issues associated with numerical stability of conventional least squares (LS) method. The algorithms’ performances such as convergence, numerical stability, and computational metrics, were experimentally evaluated for two control systems: aerial altitude control (one degree of freedom) and terrestrial robot (two degrees of freedom).


I. INTRODUCTION
To conduct large-scale tasks, such as inspection of electric power transmission lines, monitoring leaks in gas and oil pipelines, country border patrolling, agriculture, and transportation, the use of multiple robots or agents is more feasible and economical than using a single agent [1], [2]. The control tasks are carried out by multiple agents simultaneously, and new control algorithms must support such framework [3].
A multi-agent system (MAS) can be defined as a group of autonomous agents that interact with each other and share the same environment, which is perceived through sensors, where they act by carrying out certain actions [4]. In a multiagent framework, two or more agents interact and work together to conduct certain tasks or satisfy a set of goals [5]. The scientific research and practical implementations of MAS are focused on development of standards, principles, and models that allow the creation of groups of autonomous or semi-autonomous agents, capable of interacting properly to accomplish common objectives [6].
When working with multiple mobile agents such as drones or robots, it is often necessary to establish a formation between the agents [7]. Formations of agents can be char-acterized by geometrical patterns to be applied by a multiagent team. Formations appear in biological systems, such as the well-known V-shaped flight formation used by geese and other large migratory birds that allow longer flight range of a flock of birds compared to an individual bird. Formation control remains one of the most challenging problems in the control of MAS [8], [9].
Formation control of multiple agents is a challenging task due to environmental disturbances such as wind and various obstacles, leading to instabilities [10], [11]. There is a need to develop an online control method that provides optimal control solutions for the dynamic behavior of agents without full knowledge of the system parameters [12], [13].
A promising solution to solve the online control problem with plant uncertainties is the adaptive dynamic programming (ADP) method [14], [15]. This approach employs a 'forward-in-time' mechanism that looks for an optimal control policy by successively adapting two parametric structures, i.e., an action network and a critic network, to approximate the solution of the Hamilton-Jacobi-Bellman (HJB) equation [16], [17].
In ADP, the action network calculates the control actions, whereas the critic network learns to approximate the value function. This function evaluates the effect of control on the future performance and provides guidelines on how to improve the control law. These structures are combined with reinforcement learning (RL), which has been used as a framework for solving real-time optimal decision problems [18], [19].
In MAS, the problem of trajectory tracking and formation control can be treated in isolation, and in the proposed control system, the trajectory and formation are modeled in a single dynamic system, where the trajectory reference is defined by the leader (leader-follower), and the formation reference is established according to the specific application.
The algorithm proposed in this work is intended for execution in microcontrollers that are present in the agents/robots of the MAS. The main objective of this work is to design a multi-agent control system using an HDP architecture for a real-time implementation, that guarantees the convergence and stability of the closed-loop system, while providing a disturbance rejection.
This study presents two ADP algorithms for multi-agent systems: the QR-solver (developed in this research) and the RLS µ -QR-HDP-DLQR (initially developed for singleagent systems), as described in [20], [21]. We compare their performances, such as convergence of the gain, by analyzing the parameter θ of the DLQR and convergence of the trajectory and desired formation. The contribution of this study is the development of optimal control algorithms for MAS, focusing on embedded systems (terrestrial robots and aerial robots) with specifications that guarantee numerical stability, fast convergence time, and low memory use and computational cost, thus ensuring applicability in the realtime control.
The main advantage of using the optimal control approach over classical approaches to MAS, such as a consensus, is the ability to provide a solution for time-varying systems, where the dynamics of the agent and the environment can vary due to disturbances or noises [22], [23]. The optimal control approach also allows the optimization of a desired performance metric by minimizing the specific cost function. This paper is organized as follows: In Section II, the problem of trajectory tracking and formation (distributed forms that agents/robots should take in space) for MAS is described. In sequence, the optimal solution for trajectory control and MAS formation is then demonstrated. In Section III, the two control algorithms used in this study, i.e. the QR-Solver and the RLS µ -QR-HDP-DLQR, are described and a brief of Computational Complexity is introduced. Section IV contains computational simulations of these algorithms. Finally, some concluding remarks regarding the results of this work are presented.

II. PRELIMINARIES
Here we present preliminaries related to the trajectory tracking and formation control problem for MAS. We use a state-space model for a subsystem referred to as the leader, and an exosystem referred to as the followers. In sequence, the optimal control solution is demonstrated by minimizing a value function for the trajectory tracking and formation control of MAS.
The formation control problem can be defined by a shape or a relative state. The first component of MAS dictates the position of the leader, and the second component of MAS corresponds to the overall formation (followers) [24]. The dynamics of a MAS is represented by three sets of differential and algebraic equations that correspond to the leader, followers, and the formation error model.

A. LEADER MODEL
The leader agent model is given in the state-space forṁ where v ∈ R q is a state vector of the leader, and E ∈ R q×q and H ∈ R q×q are the state and input matrices of the leader, respectively. The agent input w ∈ R q is defined as w =ŵ +w, whereŵ is a feedback control input with the leader's desired trajectory v * that satisfiesv * = Ev * ,w is the external disturbance input (such as a wind gust for aerial agents and slip for terrestrial agents), and K l is the gain of the leader. Assumption 1: The disturbancew is upper bounded by W such that ||w(t)|| ≤ W .
Remark 1: The previous assumption specifies the upper bound on the disturbance and is a standard assumption in the literature [25], [26]. The assumption is not very restrictive, as most disturbances in practice have an upper bound, such as wind gust, friction, slip, and other unmodelled disturbances.

B. FOLLOWERS-AGENTS MODEL
The dynamics of the followers is given bẏ where x i ∈ R ni is the state of the agent i, A i , B i , and D i are the dynamics matrices of the agent i, u i ∈ R is the control signal, and v is the state of the leader, for i = 1, 2, ..., N . The formation error e i ∈ R is given by where C i is the control matrix of the agents and F i is the formation matrix. The output of each agent is y i = C i x i ∈ R ni , and y * i = F i v is the reference or expected trajectory for each agent.

C. CONTROL OF AN AGENT
A decentralized, closed-loop controller, proposed in [27], that is used in this paper, is given by where K i = [K xi K zi ] is the gain of the i th agent, K xi ∈ R q×q , K zi ∈ R q×q . An internal model z i ∈ R q of the i th follower is given bẏ where the characteristic polynomial of G 1 is the same as the minimal polynomial of E, and the pair (G 1 , G 2 ) is controllable and incorporates an internal model of the matrix E.

D. OPTIMAL PROBLEM FORMULATION
To control the multi-agent system, we developed an optimal controller that ensures stability of the system under disturbances. Moreover, as v ≡ v * , the developed controller is optimal in a sense that it minimizes the cost function given by where The trajectory tracking and formation control methodologies are designed for a data-driven distributed controller using an ADP, under a switching network topology. The developed approach can approximate the control gains K * i for each follower without relying on the knowledge of the system matrices A i , B i , and D i . The internal model (5) is modified and is given bẏ whereê i = y i + F i ζ i . The dynamics of ζ i ∈ R q is given bẏ with ζ 0 = v. The i th subsystem augmenting ξ i with the internal model of (8) iṡ for i = 1, 2, ..., N , wherē

E. OPTIMAL SOLUTION FOR TRAJECTORY AND FORMATION CONTROL
Using the Iterative Technique for Riccati equation computations of [28], one can determine P * i and K * i by where ∆ = Q i +(K k i ) T R i K k i . Using the Kronecker product, the cost matrices of Eq. (11) imply that where Let us define Γ and δ as where . Equation (12) provides an update of the optimal gain K, as well as the parameter P of the Riccati equation. The complete closed-loop dynamics for a MAS is given by The error of the close-loop system is given by More details and the proof of convergence of the LFS method can be found in [27].

III. ALGORITHMS FOR TRAJECTORY AND FORMATION CONTROL
In this section, two algorithms for trajectory tarcking and formation control of multi-agent systems are presented. The QR-solver algorithm is based on the LS approach, and the RLS µ -QR-HDP-DLQR algorithm uses a recursive least square (RLS). The restrictions and metrics associated with the analysis of computational complexity in the implementation of control algorithms are also presented.

A. QR-SOLVER ALGORITHM
QR-factorization has an important application in solving the LS problems [29] [30]. Often, a matrix is poorly conditioned, i.e., the matrix is extremely sensitive to errors owing to approximations that occur in numerical computations [31].
The main advantage of QR factorization is that there is no need to compute the inverse of variance or covariance matrices. Instead, it it only needed to find an inverse of R, transpose Q, and the product is the LS coefficients. The QR VOLUME 4, 2016 factorization method is a stable elementary operation that can be applied to any type of matrix. For real-time applications, poor conditioning of the variance or covariance matrices of LS can lead to the agent/robot instabilities, causing a malfunction of the overall MAS. To overcome this problem, an algorithm, called QR-solver, was developed using the Householder transformation to calculate the values of QR matrices in order to find the least squares solution [32]. The QR-solver is described in the next paragraph.
Every non-singular Φ i k ∈ R n×n has a QR factorization, which is given by where Q k i ∈ R n×n is an orthogonal matrix and R k i ∈ R n×n is an upper triangular matrix with positive diagonal elements. The inverse is given by The algebraic manipulation of (20) was applied to the solution of (12), and with a non-singular Φ k i ∈ R n×n , one has where R k i is an upper triangular matrix whose inverse can be easily be found without computing the determinant.
The QR factorization procedure for computing P k i (P of Riccati), K k i (gain of i th follower), and D T i P i parameters of Eq. (21) is presented in ALGORITHM 1 (called QR-SOLVER). ALGORITHM 1 proposes a computational structure with three main functional blocks. The first block is associated with steps 1-5, which is the setup of the algorithm with initialization of all system dynamics matrices, initial positions and velocity, gain matrices, simulation parameters, and finally state vectors. The second block establishes the formation topology in step 8. The data acquisition of the system is carried out in steps 9-12, and the following steps are based on the corresponding equations: step 9 applies Eq. (4), step 10 uses Eq. (17), step 11 applies Eq. (1) and step 12 uses Eq. (17). The control update of the agent i, represented by steps [16][17][18][19], is the third function block. The calculation of Φ i in Eq. (13) is carried out in step 16, the value of y i in step 17 is calculated using Eq. (14), step 18 refers to the LS solution of Eq. (12), and the new K i value is obtained in step 19. We propose improvements in the numerical stability of the algorithm without compromising the convergence time.

B. RLSµ-QR-HDP-DLQR ALGORITHM
According to the standard LS estimator, the optimal value of the parameter vectorθ i is given by Let Φ i be expressed in its factored form, which is given by where Φ 1/2 i is a lower triangular matrix defined as the square root of Φ i , and Φ H/2 i is the Hermitian transpose of Φ 1/2 i . Note, from the RLS µ -HDP-DLQR algorithm, that the recursive updating equation of the correlation matrix Φ i is given by the following: Pre-multiplying both sides of Eq. (23) by matrix Φ −1/2 i , a new vector variable is given by We obtain here a new form to represent the cross-correlation vector between the inputx i and the expected output d i , which is given by or equivalently where the asterisk denotes a complex conjugate. Eqs. (24), (27) are described in their Hermitian transposed forms, expressing each of the terms that appear in the second member of each equation in its transposed forms as: Suppose one chooses a unitary rotation Θ i transforming this pre-array to produce a zero block in the second entry of the post-array top-row block, as shown by the following form: where To evaluate the elements of the unknown blocks , and b * 32i of the post-array B, the procedure is to square both sides of Eq. (30). According to the matrix factorization lemma, [31], Θ i is recognized as a unitary matrix; therefore, Θ i Θ H i is equal to the identity matrix for every i. Thus, Eq. (30) can be transformed into or in an expanded form Comparing the respective terms on both sides of Eq. (33), the unknown block elements of a post-array B are determined.
31i , and b * 32i is expressed as follows: where ξ i is the a priori estimation error given by ξ i = d i − θ H i−1x i , and ρ i is a real parameter [33]. The equations in this subsection form the ALGORITHM 2, labeled RLS µ -QR-HDP-DLQR. ALGORITHM Recovery matrix P from vector θ 21 Pθ i+1 ← [θ1,θ 2/2 , ...,θ 9/2 ; ...,θ10] End -Iterative Process ALGORITHM 2 has three main functional blocks. The first block is associated with steps 1-6, which sets up and initializes the algorithm, i.e., all system dynamic matrices, the initial positions and velocity, the gain matrices, the simulation parameters, the state vectors, and the forgetting factor of the RLS. The second block establishes the formation topology in step 9, and the data acquisition of the system that is represented by the model in steps 10-13. The following steps of this block are based on the following equations: step 10 applies Eq. (4), step 11 uses Eq. (17), step 12 utilizes Eq. (1), and step 13 applies Eq. (17). The control update of the agent i, represented by steps 15-22, is the third function block. In step 15, the vectorx i is assembled using by Kronecker product of the state vector x i . In step 16, the target vector of the ADP is formed. In step 17, the matrix A i is assembled VOLUME 4, 2016 according to Eq. (31). In step 18, the QR factorization is applied, generating the matrices Q i and R i according to Eq.(34). In step 19, the values of Φ and ω are defined, and the calculation ofθ i in step 20 is applied using Eq.(25). In step 21, the matrix Pθ is formed with the elements of the vectorθ i , and step 22 applies the update of the K i gain with the solution of HJB-Riccati equation.

C. COMPUTATIONAL COMPLEXITY
The amount of resources required to run an algorithm is closely related to computational complexity [34]. The complexity analysis, for applications with embedded systems (robots) in real-time, can be performed using metrics that are minimized and/or achieved. To evaluate the performance of the proposed algorithm, computational complexity metrics are established in terms of the numerical stability issues including ill-conditioned time vectors, the quantity of the operations, the storage scaling, and the dynamic system critical time.

1) Numerical Stability
The numerical stability analyzes how errors introduced during the execution of an algorithm affect the result. It is a property of an algorithm rather than the problem being solved [35]. In computing, numerical stability refers to how a malformed input affects the execution of an algorithm. In a numerically stable algorithm, errors in the input lessen in significance as the algorithm executes, having little effect on the final output [36].
Some numerical algorithms may damp out the small fluctuations (errors) in the input data while others might amplify such errors. Calculations that are shown to attenuate approximation errors are called numerically stable. One of the common tasks of numerical analysis is to try to select algorithms which are robust, i.e., do not produce significantly different results for a very small change in the input data. In terms of numerical stability, the main concern is addressed by applying strategies to avoid ill-conditioned time vectors.

2) Computational Cost
The computational cost is the number of floating point operations (FLOPs) needed to execute a computational command, which can be a mathematical operation (e.g., addition, subtraction, or multiplication) or writing to, or reading from, the memory.

3) Data Storage
The LS demands memory storage, which depends on the time size batch mode data processing. The same is true for the RLS processing mode, but in an amount that depends on the quantity output variable measurement and the regressors.

4) Dynamic System Critical Time
The critical time of the dynamic system is the overall limitation to establish an online optimal decision, i.e., in our context, a constraint that the optimal controller (CPU time) must provide an action response faster than the time constant of the controlled dynamic system.
To ensure the functionality of the robotic embedded system, the response time (control output) of the control algorithm must be less than the dynamic system critical time. If this condition is not met, the control methodology must provide some response in time, i.e an alternative response as the previous gain or a sub-optimal gain will be used to maintain the multi-agent system running.

IV. COMPUTATIONAL EXPERIMENTS
In this section we present the computational results of the algorithms and evaluate capability of the agents to follow the leader while maintaining the desired formation. With focus on performance for online control methodology in multiagent systems, the QR-solver and RLS µ -QR-HDP-DLQR algorithms are evaluated in relation to convergence of parameter θ and the evolution of the control process. Furthermore, the reference policy is established using the offline Schur solution to the HJB-Riccati equation. The evaluations aim to verify the convergence behavior of the estimators.
Computational tests were carried out using MATLAB® and comparisons were performed to evaluate the accuracy and convergence of the solutions provided by the algorithms. To demonstrate the effect of the forgetting factor µ, several simulations were conducted. The convergence process of RLS µ -QR-HDP-DLQR for the HJB-Riccati equation solutions for different µ values was verified in [37].
We consider a multi-agent system consisting of three agents: the leader and two followers. The connectivity topology is shown in Figure 1. Based on Figure 1, the Laplacian matrices for multi-agent systems T 1 and T 2 are given by

A. DRONE MODEL IN Z-AXIS AND NUMERICAL SIMULATION
The first model is the altitude control of the mini-drone Parrot®. Considering only the quad-rotor altitude control, the equation that governs the dynamics of this system is given by [38] where F z is the force applied along the Z-axis, F c is the force of drone propellers, and F g is the force of gravity on the drone. This is equivalent to for a multi-rotor moving only along the Z-axis, without any angular motion in which the four motors exert the same force, resulting in F c = 4k f ω 2 . The angular velocity of the blades is ω = 22000rpm, the power constant is k f = 3.1 × 10 −8 , mass is m = 0.08kg, and the gravity constant is g = 9.81m/s 2 . The matrices of the dynamic system are as follows: where the states are the speed of the rotors (V z ) and the vertical position of the quad-rotor (P z ).
The simulation experiments are presented for the minidrone Parrot® and the functionality of the QR decomposition is verified on the RLS estimator of the approximated state-value function for the online HDP-DLQR control system, as well as on the linear, least-square solution in QR-solver. The numerical stability, trajectory, and formation convergence of the QR-solver and RLS µ -QR-HDP-DLQR algorithms are analyzed using a system with three agents, i.e., the leader and two followers.
The desired trajectory of the leader and the followers are shown in Figure 2, where the former starts from point A (2m high) to point B (3m high) and returns to the starting point. The followers execute the same form of trajectory following the formation points (A 1 , B 1 ) for Follower 1 and (A 2 , B 2 ) for Follower 2.

1) QR-Solver Evaluation
The control update using the QR-solver algorithm is shown in Figure 3, for a cycle of 10 iterations. Figure 3 shows the convergence of p 11 , p 22 , p 33 , and p 44 of matrix P corresponding to components θ 1 , θ 5 , θ 8 , and θ 10 of the parameter vector θ for Follower 1, respectively. The parameter P estim reaches the optimal solution P 0 (Schur solution) after approximately 10 iterations of the control update. The evolution of the iterative process for the QR-solver algorithm is shown in Figure 4 for a 10s cycle of the control process.   Figure 4 shows the trajectory of the elements (leader, Follower 1, and Follower 2) and the desired trajectory (leader * , Follower 1 * , and Follower 2 * ) of the multi-agent system during the iterative process. VOLUME 4, 2016 The leader moves from position A (2m) to position B (3m) several times, changing its altitude (Z-axis). The followers maintain the altitude distance with respect the flight formation plan shown in Figure 2. After the control update (gain update for the optimal solution) is applied within a 5s period, Follower 1 and Follower 2 trajectory tracking is set by the leader, and after 9s they reach the desired formation.
The QR-solver showed satisfactory results for trajectory tracking and formation control. It required 5s to compute the new gain (batch solution), and this time is used to fill the matrices Γ and δ of Eq. (15)(16), necessary for the calculation of Φ k i and y k i in Eq. (12).

2) RLSµ-QR-HDP-DLQR evaluation
The results of the RLS µ -QR-HDP-DLQR experiments are presented in an analogous form to those of the QR-solver.
A cycle of 200 iterations of the control update and 20s of the control process (sample time 0.1s) for ALGORITHM 2 are shown in Figure 5 and Figure 6. Using the forgetting factor µ = 0.9142, Figure 5 shows the convergence of parameters θ 1 , θ 5 , θ 8 , and θ 10 of the parameter vector θ for Follower1. The convergence of the θ parameters (p 11 , p 22 , p 33 , and p 44 ) presents numerical instabilities in the first 50 iterations, after which the solution around iteration 80 is already admissible. Figure 6 shows the results of the RLS µ -QR-HDP-DLQR algorithm. Here, Follower 1 and Follower 2 spend more time acquiring the desired formation compared to the previous algorithm, which shows only a simulation within the interval of 10s to 20s. This is a period during which followers track the desired trajectory and formation; the remainder of the simulation is not presented owing to a lack of significant information, and to stress the convergence of the trajectory tracking and formation control. The RLS µ -QR-HDP-DLQR algorithm also showed satisfactory results for multi-agent control despite the delay in assembly of the desired formation. As the advantage of this algorithm, if any variation in the process parameters occurs, the adaptability ensures that a new control policy P will be calculated for a controller gain K update (actor/critic architecture).

3) Results of the Numerical Simulation
The convergence results of the P matrix for the drone model of ALGORITHM 1 and ALGORITHM 2 are presented for the first follower (Follower1), along with the numerical values of Schur's solution in Table 1. The convergence values of the matrix P presented in the graphs of Figure 5, are multiplied by 10 −3 . The computational costs for applying the control update of ALGORITHM 1 and ALGORITHM 2 in the drone model along the Z-axis is calculated using [31], and can be seen in Table 2. Analyzing the results of Experiment 1, it can be observed that the computational cost of RLS µ -QR-HDP-DLQR is lower than that of QR-solver, suggesting that the algorithm is more viable for real-time implementation with microcontrollers. However, this cost is repeated at each iteration of ALGORITHM 2, resulting in a higher cost in the long run.

B. TERRESTRIAL ROBOT MODEL IN X-AND Y -AXIS
To verify the operation of the control algorithms in models with more degrees of freedom (DoF), the model of a terrestrial robot with two DoF (X-axis and Y-axis) is chosen. The state-space matrices A and B are given by [39] where the states are the position and velocity along the Xaxis (P x , V x ) and the position and velocity along the Y -axis (P y ,V y ). The simulation results are presented for a terrestrial robot, and the numerical stability, trajectory, and formation convergence of the QR-solver and RLS µ -QR-HDP-DLQR algorithms were analyzed for a system with three agents -the leader and two followers.
The desired leader and followers trajectories are shown in Figure 7, where the leader trajectory is a circle along the X-axis and Y -axis with a 2m diameter. The followers must follow this circular trajectory, while maintaining the formation distance of 0.2m between each agent. The evolution of the iterative process for the QR-solver is shown in Figure 8 and Figure 9, for a cycle of 10 iterations of the control update part in ALGORITHM 1, and 10s for the iterative process. Figure 8 (a)-(d) shows the convergence behavior of the elements p 11 , p 22 , p 33 , and p 44 of matrix P corresponding to the components θ 1 , θ 5 , θ 8 , and θ 10 of the parameter vector θ of Follower 1, respectively. The parameter P estim reaches the Schur solution P 0 in approximately 7 iterations. . Figure 9 shows the trajectory of the agents during the iterative process, showing the circular motion of the leader. The followers maintain the distance with respect to the desired formation shown in Figure 7.
After the control update has been applied (at 5s), Follower 1 and Follower 2 tracking the trajectory are defined by the leader, and after 6s they reach the desired formation. The QR-solver algorithm used for the control of terrestrial robots has presented significant results for trajectory and formation control.

2) RLSµ-QR-HDP-DLQR Evaluation
A cycle of 200 iterations of the control update of the ALGORITHM 2 is shown in Figure 10. Using the forgetting factor µ = 0.9142, Figure 10 shows the convergence of θ 1 , θ 5 , θ 8 , and θ 10 of the parameter vector θ of Follower 1.  Figure 11 presents the results of the RLS µ -QR-HDP-DLQR algorithm. During 20s of the control process, Follower 1 and Follower 2 begin a back and forth movement, and after 5s, they copy the trajectory of the leader (circular motion). After 10s, they start to follow the formation, and at close to 15s, followers track the desired formation and trajectory. The RLS µ -QR-HDP-DLQR algorithm also showed good results for convergence of the trajectory and formation in multi-agent control.

3) Results of Numerical Simulation
The convergence results of the P matrix for the terrestrial robot model of ALGORITHM 1 and ALGORITHM 2 are presented for the first follower (Follower 1), along with the numerical values of Schur's solution in Table 3. The convergence values of matrix P presented in the graphs of Figure 10, are multiplied by 10 −3 . The computational costs for applying the control update, of ALGORITHM 1 and ALGORITHM 2, in terrestrial robot model in X-axis and Y -axis are calculated using [31], and can be seen in Table 4. Analyzing the results of Experiment 2, it was observed that the computational cost of RLS µ -QR-HDP-DLQR is lower than that of QR-solve. However, in this experiment, ALGORITHM 2 required twice as much time as ALGORITHM 1 to achieve the convergence for the trajectory tracking and formation control.

C. COMPARISON BETWEEN MAS CONTROL ALGORITHMS
To evaluate the merits of the QR-Solver and RLS µ -QR-HDP-DLQR algorithms for trajectory tracking and formation control of MAS, a comparison is performed among the algorithms presented in this work and an approach based on a consensus algorithm [40]. The algorithms were compared in relation to time needed to reach the desired trajectory and mean absolute error of formation of N agents. The results obtained are shown in Table 5. The formation error of each agent is E (L,1) = h (L,1) − d (L,1) , where h (L,1) and d (L,1) are the desired and measured distance between the leader and Follower 1, respectively.
We conclude that ALGORITHM 1 has a higher cost in applying the control update, generating a higher consumption burden on the CPU and memory resources, whereas ALGORITHM 2 has a lower cost that occurs on a recurring basis, generating a continuous usage of the CPU and memory resources, leading to greater energy consumption, and reducing the operating time of the agent/robot.
Simulations with a larger number of agents are performed to evaluate the convergence of the control algorithms. The achieved results show convergence of the algorithms QR-Solver and RLS µ -QR-HDP-DLQR in trajectory tracking and formation control of systems with multiple agents and with any viable formation [41].
For collision avoidance, a well-established technique is to add a barrier function term in the optimal control cost function Eq. (6) (see [42]- [44]), which prevents collisions between agents.

V. CONCLUSION
We presented two algorithms for the trajectory tracking and formation control of multi-agent systems. Based on numerical simulations, the QR-solver and RLS µ -QR-HDP-DLQR algorithms show satisfactory results for trajectory tracking and formation convergence. The QR-solver required less time to achieve convergence of the estimated parameters because of fewer updates. The algorithm based on HDP-DLQR is adaptable to exogenous perturbations in the plant in terms of the assigned trajectory and formation.
Based on the analysis, focusing on embedding the two algorithms in microcontrollers for real-world applications, the QR-solver algorithm is computationally more stable and faster in terms of convergence of the gain -two essential features for real-time control applications. The RLS µ -QR-HDP-DLQR algorithm has a superior adaptability, i.e., it is better at overcoming system uncertainties.
The future work will involve mobile robots implementation of QR-solver and RLS µ -QR-HDP-DLQR algorithms. For this, it is also necessary to add a collision avoidance routine to the presented algorithms, preventing the agents from colliding with any obstacle.