Consensus Tracking via Iterative Learning for Multi-Agent Systems With Random Initial States

In this paper, a distributed consensus iterative learning control algorithm is proposed for the finite-time consensus tracking of multi-agent systems with random initial states. The tracking errors from the agent itself and its neighbours are applied to successively rectify the control protocol when only some agents can obtain the desired trajectory information. Additionally, a time interval is designed in the control protocol, and the random initial state errors are rectified, one-by-one, in the time interval. The interval can gradually shorten as the number of iterations increases. Furthermore, the convergence of the algorithm is theoretically proven by the contraction mapping method, and the convergence condition is derived. The proposed algorithm can cause the time interval to gradually shorten with the iterations increase, during the time, the desired trajectory cannot be consistently tracked due to the random initial state errors. As a consequence, the algorithm gradually widens the time interval for full consensus tracking of the multi agents. Finally, the simulation examples are provided to verify the effectiveness of the algorithm.


I. INTRODUCTION
Multi-agent systems are composed of several single agents. Each agent coordinates and cooperates to achieve a common goal by communicating with each other in accordance with the control protocol. Multi-agent systems can complete the tasks that are difficult to complete with single agents, and thus, they enhance the systems' ability to solve complex problems. Additionally, multi-agent systems have the advantages of strong reliability, robustness and flexibility [1], [2]. With the development of sensors, computers, communications and other technologies, multi-agent systems have a wide range of applications in mobile sensors [3], drone group control [4], [5], event-triggered control of network systems [6], [7] and multi-robot formation control areas [8], [9].
Because of the collaborative control of multi-agent systems, the consensus problem is a key issue and is also one of the hot research topics in the control field. A so-called consensus means that individuals in a multi-agent system The associate editor coordinating the review of this manuscript and approving it for publication was Engang Tian . use their own and their neighbor's information to construct a control protocol, so the state or output of all the individuals eventually converge [10]. In recent years, scholars have studied the consensus of multi-agent systems via Lyapunov functions, matrix theory, algebraic graph theory and other methods and have obtained profitable research results [11]- [15]. However, most current research results are about asymptotic consensus tracking with the increased times, rather than full consensus tracking in the entire finite time interval.
For the systems implementing repetitive tasks in finite time intervals, iterative learning controls can achieve the full tracking of the desired trajectory in the entire finite time interval [16]. The idea is to use the deviation signal from the actual output and the desired trajectory to successively rectify the control signal until zero error tracking is achieved in a finite time interval. Due to the design of the iterative learning controllers only needs less knowledge of the dynamic characteristics, and the controller has the advantages of having a simple structure and executing a small amount of online calculations. Thus the control technique has received a great amount of attention from researchers since the technique was proposed by Arimoto [16], and it has obtained lots of research results [17]- [21].
To ensure the convergence of the algorithm, the existing research results often assume that the initial state of the system satisfies certain conditions when designing iterative learning controllers. These assumptions can be divided into four categories. In the first category, it is assumed that the initial state error is 0 [22]- [24], that is, the initial state of the system must be equal to the desired initial state at each iteration. In the second category, it is assumed that the initial state error is fixed [10], [25], [26], that is, the initial state at each iteration is fixed, but it may not be equal to the desired initial state. However, in the running process of the actual system, it is difficult to keep the initial state of each iteration completely consistent, so these two kinds of strict repetition conditions limit the application of iterative learning control. To get closer to the actual operation of the system, some scholars have proposed that the initial state error is concentrated in a bound [27], [28], which is the third kind of assumption for the initial state. This kind of assumption means that the tracking error can only converge to one bound and is related to the initial state error, so zero error tracking of the output to the desired trajectory cannot be achieved. To eliminate the influence of the initial error on tracking accuracy, an iterative learning control algorithm with an initial rectifying action is proposed in [29], and this algorithm can achieve complete tracking after a specified time interval. However, the algorithm still requires that the initial state error be bounded and cannot satisfy the case of a random initial state. In recent years, in order to further relax the restrictions on the initial positioning conditions, some scholars have allowed the initial state of each iteration to be a random value [17], [30], [31], which is the fourth kind of hypothesis. This kind of assumption can obtain zero-error complete tracking in a specified interval. However, to date, the fourth kind of hypothesis only analyses the convergence of a single agent, and does not solve the problem of consistent tracking of the multi-agent systems with a random initial state.
In light of the advantages of iterative learning controls, many scholars have applied iterative learning controls into the coordination controls of multi-agent systems [24], [10], [32]- [38]. In [10], the distributed iterative learning control method was designed to solve the formation control problem of discrete-time multi-agent systems. For a class of non-affine nonlinear multi-agent systems, a kind of iterative learning control algorithm was proposed in [32], if the number of iterations was sufficiently large, so that the agent outputs can reach consensus. For discrete-time linear multi-agent systems with communication delays and switching topologies, a kind of iterative learning control protocol was also devised in [33], obtaining consensus for multi-agent outputs in a finite time interval. Similarly, [34] presented a kind of iterative learning control protocol by using the Bernoulli sequence to describe the data dropout, so that the output of non-linear multi-agent systems with data dropout can achieve consensus tracking of the desired trajectory in a finite time. To solve the problem of consensus tracking for linear multi-agent systems with a variable communication topology, an input sharing iterative learning control protocol was put forward in [35]. In [36], for an output-saturated multi-agent system, an iterative learning control protocol was proposed under a fixed topology to achieve consensus tracking of the system output within a finite time, and the results were generalized to the multi-agent systems with switching topologies.
All the abovementioned results set the initial state of the system based on two assumptions in order to guarantee the convergence of the algorithms. The first assumes that the initial state of the system is exactly the same as the expected state at each iteration [24], [35]; the other assumes that the initial state stays fixed at each iteration [10], [32]- [34], [36], [37]. The two assumptions are too strict for multi-agent systems in practical operations. As a result, the above conclusions do not apply if the initial state of the multi-agent is different at each iteration. To relax the restrictions on the initial state of the multi-agents, the article presented recently in [38] designed a non-parametric uncertain multi-agent system using the Lyapunov method to design a consensus iterative learning controller to solve the consensus tracking problem for multi-agent systems with random initial states. However, [38] cannot cause the interval of the consensus tracking to gradually widen with the increase of iterations, but only achieves consensus tracking in a predetermined interval of a finite time interval. That is, this approach cannot fully achieve consensus zero error tracking throughout the finite time interval (0, T ].
To resolve the finite-time consensus tracking problem of multi-agent systems with random initial states, a distributed consensus iterative learning control algorithm is proposed. The expected trajectory in this algorithm is generated by the introduced virtual leader, and the expected trajectory information is only available to some agents. Using the tracking error of the agent itself and its neighbours, the control protocols are rectified one-by-one. Additionally, a time interval is designed in the control protocol, and the random initial state errors are rectified one-by-one in this time interval, leading to shortening gradually the interval with the increase of the iterations. The contraction mapping method theoretically demonstrates that, after the designed time interval, the multi-agent output can be completely consistent in tracking the desired trajectory as the iterations increase, and the time interval of the full consensus tracking is gradually widened as the iterations increase, ultimately achieving full-consensus zero error tracking throughout the finite time interval (0, T ]. Finally, the effectiveness of the proposed algorithm is further verified by simulation examples.

II. PROBLEM DESCRIPTION
Consider a multi-agent system composed of N same-structure nonlinear agents, and assume that the initial state of each agent is random at each iteration and each agent runs repetitively in t ∈ [0, T ]. The dynamic model of the j-th nonlinear VOLUME 8, 2020 agent is expressed as follows: where x j,k (t) ∈ R n , u j,k (t) ∈ R r , and y j,k (t) ∈ R m represent the state, control input and output of the j-th agent, respectively. Nonlinear function f (x j,k (t), t) is unknown, B ∈ R n×r is a known input matrix, which is a right invertible matrix, C ∈ R m×n is a known output matrix. Subscript k denotes the number of iterative learning, j(j = 1, 2, · · · , N ) denotes the j-th agent, and t ∈ [0, T ] denotes the finite time of the repetitive operation. Assumption 1: Each agent's nonlinear function f (·, ·) satisfies the global Lipschitz condition for all t ∈ [0, T ]. That is, there exists a constant k f > 0 for each agent during t ∈ [0, T ], then the following inequality holds: (2) Assumption 2: For each agent there exists a unique desired control u d (t), accordingly the state and output are denoted as The desired trajectory is given for the agents by introducing the virtual leader into the study. Due to multi-agents exchange of information with each other through the communication topology, the information each agent can obtain depends on the communication topology structure between the multiagents. In addition, thus, relevant knowledge of graph theory is introduced in order to lay the foundation for subsequent analysis.
The communication topology between the N agents is described as a weighted graph , v N } denote the adjacency matrix, edge and node of the graph, respectively. Each node in v represents an agent.
is represented as an edge composed of nodes i and j; the adjacency matrix A is represented as the connection of nodes i and j; its diagonal element is a i,j = 0. If there is a connection between nodes i and j, then a i,j = a j,i > 0; otherwise a i,j = 0, where j ∈ N i , and N i = {j} is the neighbour node collection of node i.
When the virtual leader is introduced, if the leader is agent 0, then the new graphζ = {0 ∪ v,ε,Ā) is represented as the information relationships between all agents including the virtual leaders, whereε andĀ are denoted as the new edge and adjacency matrix, respectively. At this time, s i represents the connection between the agent i and the virtual leader. s i = 0 indicates agent i has no connection with virtual leader; s i > 0 indicates that agent i has a direct connection with the virtual leader.
According to the aforementioned analysis, in order to make the multi-agent whose initial state is random at each iteration have the ability of consensus tracking of the desired trajectory in the finite time interval and its tracking interval widen gradually with the increase of iterations, the following distributed iterative learning control algorithm is designed as where β > 1 is an adjustment parameter; 0 < h < T is a predetermined time constant; is a learning gain matrix; are the tracking errors of the j and i-th agent, respectively; B −1 R is the right inverse matrix of B.
For convenience of analysis, denote Thus, (1), (4) and (5) are rewritten into the following form where ⊗ is a Kronecker product; I is identity matrix with a corresponding dimension; S = diag(s 1 , s 2 , · · · , s N ).
The main objective of this paper is to design a distributed consensus iterative learning control algorithm (4) ∼ (6) for a class of nonlinear multi-agent systems with random initial states at each iteration, such that the multi-agent output can fully and consistently track the desired trajectory in a finite time interval t ∈ (0, T ] when the number of iterations is sufficiently large. Remark 1: The introduced virtual leader can only provide the desired trajectory but cannot obtain the information from the agents, however, each agent can obtain the information from the neighbours (including the virtual leader).
Remark 2: This article assumes that only some agents can directly obtain the desired trajectory information, that is, only the agents who have direct communication with the virtual leader can obtain the desired trajectory information in the communication topology.

III. CONVERGENCE ANALYSIS
For convenience of the proof of the convengence of the proposed algorithm, the following related definitions and lemmas are given.
Definition 1: The λ-norm of a vector function h(t) in a time interval t ∈ [0, T ] is defined as where · is a norm on R n , t ∈ [0, T ] is a time variable.
Lemma 1 [39] sup The main results of this paper are given as follows: Theorem 1: For the multi-agent system (7) satisfying Assumptions 1∼2, when the initial state of each repetition is random, if the learning gain satisfies Then the control algorithms (8) ∼ (9) and (6) can cause the multi-agent output y k (t) to have the following characteristics: i) In the interval t ∈ [h β k , T ], when the number of iterations k → ∞, there exists lim k→∞ e k λ = 0. That is, the output y k (t) is completely consistent in tracking the desired trajectory y d (t).
ii) In the interval t ∈ [0, h β k ), due to the influence of random initial state error, the output y k (t) cannot track the desired trajectory y d (t), but when t = h β k , the influence of the random initial state error will completely halt. In addition, the interval [0, h β k ) gradually shortens with the increase of the number of iterations, when k → ∞, h β k → 0, thus making [h β k , T ] → (0, T ], that is, the multi-agent output is completely consistent in tracking the desired trajectory in t ∈ (0, T ]. Proof: It is first proven that the control laws (8) ∼ (9) and (6) make the tracking error of the system (7) equal zero in t ∈ [h β k , T ]. Then, it is noted that due to the influence of the random initial state error, the time interval t ∈ [0, h β k ) in which the system output can not track the desired output is shortened with the increase of the number of iterations.
i) When t ∈ [h β k , T ], from the first expression of (7), we know the state error between the two adjacent iterations is Deriving (u k+1 (τ ) − u k (τ )) from (8), and multiplyinng (u k+1 (τ ) − u k (τ )) by the left by (I ⊗ B), we obtain Left multiplication of both sides of (9) by (I ⊗ B), we obtain From (13), we have Substituting (14) and (12) into (11), we have From the definition of tracking error, we know

VOLUME 8, 2020
Substituting (15) into (16) yields From (17), we obtain Since when t ∈ [h β k , T ], from (6) we know Therefore, substituting (19) into (18) Taking the norm on both sides of (20) and combing them with Assumption 1, we have e k+1 (t) ≤ I − (L + S) ⊗ (CB ) e k (t) where Then multiplying both sides of (21) by e −λt (λ > 0), and combing with definition of λ-norm and Lemma 1, we get Substituting (19) into (15) gets Taking the norm of both sides of (23) and according to Assumption 1, we obtain where b 2 = (L + S) ⊗ B . Similarly, multiplying both sides of (24) by e −λt (λ > 0), and combing with definition of λ-norm and Lemma 1, we obtain When λ > k f , from (25) we get Similarly, substituting (26) into (22) gets where (27) we can know when k = 1, there exists When k = 2, we have e 3 λ ≤ ρ 1 e 2 λ ≤ ρ 1 (ρ 1 e 1 λ ) = ρ 2 1 e 1 λ . (29) By analogy, we can get It is known from (30) that the λ-norm of the tracking error at each iteration is less than or equal to the λ-norm of the tracking error at the first iteration. It can be seen that the λnorm of tracking error at different iterations does not depend on the number of iterations, as long as 0 < ρ 1 < 1 is satisfied, lim k→∞ e k λ = lim k→∞ ρ (k−1) 1 e 1 λ = 0 holds as k → ∞. However, From the condition (10) in Theorem 1 we know 0 < ρ 1 < 1 holds if λ is sufficient large. Therefore, from (30) we know there exists lim k→∞ e k λ = 0. That is, the multi-agent system output y k (t) can fully and consistently track the desired trajectory ii) When t ∈ [0, h β k ), the system output cannot track the desired trajectory due to the random initial state error at each iteration. We can analyse how the proposed algorithm guarantees that this time interval will shorten with the increase of iterations. From (6) we know that within t ∈ [0, h β k ), there exists From (31) we know t 0 θ k,h (τ )dτ − 1 = 0 when t = h β k . Therefore, from (18) we know the behaviour that the system output cannot track the desired output due to the random initial state error, and there is a complete halt at all times when t = h β k . And t = h β k → 0, when k → ∞. That is, as the number of iterations increases, the time interval [0, h β k ) in which the output cannot track the desired output is shortened.
This completes the proof. Remark 3: It can be seen from (18) and (19) that the effect of the initial state error on the tracking performance is eliminated with the help of the integral action of θ k,h (t). If θ k,h (t) is replaced with the pulse function δ(t) (δ(t) is a time function independent of the number of iterations), the influence of the initial state error can be instantly eliminated at the beginning of each iteration. This can not be achieved in the actual system when the output of the controller tends to infinity. However, θ k,h (t) in (4) approximates the instantaneous integration of the pulse function step-by-step with the increase of the number of iterations. Although θ k,h (t) will gradually increase with the increase of the number of iterations, and thus the output of the controller will be increased step-by-step. However, in practical engineering terms, the tracking error can meet the requirements, if the tracking error reaches a certain range of accuracy rather than requiring that the tracking error is zero: that is, the number of iterations is limited, not infinite. Therefore, from the point of view of practical engineering, the output of the controller designed in this paper will not reach infinity.

IV. SIMULATION ANALYSIS
To testify the effectiveness of the proposed algorithm, consider a class of nonlinear multi-agent systems as follows: the system is composed of four same-structure agents and one virtual leader, its topology is shown in Fig.1. The system repetitively runs in a finite time interval, and the initial state is random at each iteration. The model of the j-th agent of the system (j ∈ [1,2,3,4] In Fig. 1, node 0 represents the virtual leader, which provides a given desired trajectory. From Fig. 1, we know only some of the agents (agent 1 and agent 3) can directly obtain the desired trajectory information, S = diag(2, 0, 2, 0), and Laplacian matrix is The multi-agent system composed of (29) and Fig. 1, it is controlled by using algorithms (8)∼(9) and (6). Assume that the repetitive run time is t ∈ [0, 1], and the desired output is y d (t) = sin(2π t). At each iteration, the initial state x j,k (0) of each agent is randomly generated by the function rand(·), i.e., x j,k (0) = [rand(1), rand(1)] T . Here, rand(·) is a random number between 0 and 1. The initial control is selected as u j,0 (t) = [0, 0] T , β = 1.03, h = 0.1; the learning gain is    Fig. 6 is the consensus tracking curve of all the agents at the 60th iteration. Fig. 7 shows the maximum tracking errors of the four agents' outputs at each iteration in the time interval t ∈ [h β k , T ].
From Figs. 2∼6 we know that under a random initial state condition, the output of each agent can achieve full consensus tracking in the time interval t ∈ [h β k , T ] at each iteration, and the time interval in which the desired trajectory cannot be tracked can shorten with the increase of iteration. Additionally, Fig. 7 shows that the tracking error gradually tends to zero within the time interval [h β k , T ], as the number of iterations increases.    The reason is that this algorithm designs a time interval [0, t 1 ), in which the random initial state error is rectified, where t 1 = h β k gradually decreases with the increase of the number of iterations. Therefore, this makes the interval t ∈ [t 1 , T ] of the zero error tracking of the desired trajectory gradually widen with the increase of the iterations, to make the interval [0, t 1 ) unable to track the desired trajectory and to gradually shorten, ultimately achieving the complete tracking of the desired trajectory over (0, T ] as k → ∞. However,  when t 1 = h is a fixed value, the zero error tracking is achieved only over the fixed time interval [t 1 , T ], but the time interval [0, t 1 ) is unable to track the desired trajectory and cannot gradually shorten with the increase of iterations. The simulation result at the 60th iteration is shown as Fig. 8.
To see more clearly the influence of the decreasing rectifying interval on the control input as the number of iterations increases, we provide Figs. 9-13. Where Figs. 9∼10 show the variation curves of the maximum value of the control inputs with the increase of the number of iterations; Figs. 11∼12 show the time-dependent curves of the control inputs at the 50th iteration; Fig. 13 shows the variation curve of the tracking error with time at the 50th iteration.
It can be seen from Figs. 9∼10 that the maximum value of the control input gradually increases with the increase of the number of iterations, but it can be seen from Figs. 11∼12 that the maximum value of the control input only appears in the rectifying interval, while after the rectifying interval, the control input is basically close to 0.
In addition, it can be seen from Fig. 13 that after 50 iterations, the rectifying interval of the initial state error is shortened from [0, 0.1]s to [0, 0.02]s at the beginning of the iteration, which shows that the influence interval of the initial state error on the tracking accuracy is shortened. Additionally, it can also be seen from Fig. 13     tracking error is close to 0 in the interval [0.02, 1]s. This result shows that although the rectifying interval shortens with the increase of the number of iterations, the control input gradually increases in the rectifying interval. However, in practical engineering terms, the tracking error can meet the requirements, if the tracking error reaches a certain accuracy, rather than requiring the tracking error to equal 0. In other words, the tracking error can meet the need of the project after a limited number of iterations. Therefore, from the point of view of practical engineering, the control input will not infinitely increase in the rectifying interval.   At the same time, from the algorithms (4)∼(6) we can know when h, L, S, are fixed, the tracking precision in [h β k , T ] will depend on the number of iterations k and the precision increases gradually with the increase of k; the length of the rectifying interval [0, h β k ) depends on the parameters β and k. When the number of iterations k is the same, the larger β is, the shorter the rectifying interval is and the larger the control input at the initial rectifying time is; the smaller β is, the longer the rectifying interval is and the smaller the control input at the initial rectifying time is.   In order to illustrate the influence of β on the rectifying interval and the initial control input, the following simulation is analyzed when β is different and other parameters are the same. When β = 1.03, the control input at the 50th iteration is shown in Figs. 11-12, and their tracking trajectories of the four agents are shown in Fig. 16; When β = 1.003, the control input at the 50th iteration is shown in Figs. 14-15, and their tracking trajectories of the four agents are shown in Fig. 17. By comparing Fig. 11 with Fig.14, and Fig.12 with Fig. 15, we can see at the same number of iterations, the control input of the initial time decreases obviously with the decrease of β. From Figs. 16 and 17, we can see the tracking precision in [h β k , T ] remains the same, but the rectifying interval is widened appropriately.
Therefore, in practical application, on the premise that the tracking precision meets the requirements (keeping a certain number of iterations unchanged), the rectifying interval is appropriately relaxed (a smaller β is selected) to reduce the control input at the initial rectifying time. It can also be considered as a compromise according to the requirements of tracking precision and rectifying interval.

V. CONCLUSION
For a class of nonlinear multi-agent systems, a distributed iterative learning control protocol is proposed to solve consensus tracking control problems with the random initial states. To overcome the influence of the random initial state error on tracking performance, a time interval is designed in the control protocol, so that the influence of the random initial state error only affects this time interval, and the time interval is shortened as the number of iterations increases. The validity of the proposed algorithm is theoretically proved, and the convergence conditions are given. Theoretical analysis and simulation results show that the distributed control protocol proposed in this paper can make the output of the multi-agent system with a random initial state completely track the desired trajectory within a finite time interval (0, T ] after a sufficient number of iterations. Thus this effectively suppresses the influence of the random initial error on the tracking performance, and relaxes the restrictions of the consensus iterative learning control on the initial positioning conditions.