Distributed Reconfiguration for Resilient Synchronization of Multi-agent Systems

The performance in output synchronization in multi-agent systems (MAS) can degrade in the presence of misbehaving agents that are affected by external attacker actions that try to desynchronize the system. In this paper, we introduce a heuristic algorithm based on best-response games to define an online distributed reconfiguration strategy to dynamically mitigate these actions, to make the system reach a synchronization state. This algorithm provides each unattacked agent with a decision-making based on local information to reconfigure the interaction patterns with its neighbors such that the system eventually synchronizes. We present some simulations of the proposed strategy for several systems in realistic scenarios to show ability of the of the proposed algorithm to mitigate different types of attacks.


I. INTRODUCTION
The study of distributed control systems has become a relevant topic in control engineering. Its significance lies on the advantages that this type of controllers provide in large scale systems that include scalability and the capability to work in scenarios without full information [1], [2], [3]. A family of distributed control protocols that stand out are the synchronization algorithms. Their many applications in different fields involve formation control and flocking [4], synchronization of coupled oscillators [5], [6], data fusion in sensor networks [7], reinforcement learning [8], [9], and dynamics of human social systems [10], [11], [12], [13], [14]. Typically, these algorithms assume that all agents in the network are trustful, implying that they will follow a distributed pre-defined synchronization protocol. However, the misbehavior of at least one agent, deliberate or not, could deviate the system from a synchronization state. For this reason, the design of distributed control protocols that can be resilient to the action of misbehaving agents has gained importance over the past years [15], [16], [17], [18]. Resilient synchronization refers to the coordinated action of agents to reach a synchronization state in the presence of such misbehaving agents.
Several protocols that provide solutions to overcome the vulnerability of synchronization systems to malicious agents have been proposed, typically at the expense of increasing the complexity of them [19], [20], [21]. For example, in [22], agents that are considered to have the most dissimilar behavior are discarded, allowing unattacked agents to synchronize. This method, known as weighted-mean subsequence-reduced algorithm, can only work in a specific type of graphs, defined as robust networks, and with agents that are characterized by a 1-dimensional state variable. Similarly, the authors in [23] proposed to randomly change the connections over the graph to produce a resilient network. This solution, known as moving target defense, dramatically reduces the effect of misbehaving agents on the network but, due to randomness, malicious agents still affect the whole performance of the system. The study in [24] presented a mitigation strategy in linear systems that is based on the comparison of error sequences using the Kullback-Leibler divergence. With the information provided by the divergence, the nodes in the network can successfully neglect the effect of malicious agents. All the aforementioned algorithms share an important characteristic: they mitigate the effect of misbehaving agents over a network by changing the connections of the topology of the communication graph, preserving the predefined control algorithm. This supposes an advantage for the

A. GRAPH THEORY CONCEPTS
Communication between agents is described by a weighted directed graph (digraph) G = (V, E, w), where V = {1, ..., N } is the set of agents, E ⊆ V × V is the edge set that describes the interaction between agents, and w : V × V → R ≥0 is the weighting function of the edge set. The existence of edge (i, j) states that agent i receives information from agent j. The adjacency matrix of a graph A = [a ij ] is a N × N matrix where a ij = w(i, j) if (i, j) ∈ E and 0 otherwise. Let us define the neighbor set of node i as N i := {j : (i, j) ∈ E}. The degree of a node is defined as the sum of all the weights of its neighbors, i.e. d i = j a ij . A directed path between two nodes, x and y, is a sequence of edges ((v 0 , v 1 ), (v 1 , v 2 ), . . . , (v n−2 , v n−1 ), (v n−1 , v n )) where v 0 = x, v n = y and (v i , v i+1 ) ∈ E for all possible i. A graph is strongly connected if there exists a directed path between every pair of nodes in V. Also, the κ−connectivity of a graph is defined as the maximum number of nodes that can be removed while keeping the graph strongly connected. Katz centrality matrix of a graph, denoted as M = [m ij ], measures the relative importance of node j for node i in a directed graph taking into account all the possible walks between them. This metric is defined as [25] where I is the identity matrix, η ∈ [0, 1/d) is an attenuation factor that ensures that the summation converges andd := max i d i . In Equation (1), [A k ] ij represents the weight of a walk from node i to node j in k steps. Then, Katz centrality measures all the possible ways to connect two different nodes giving more importance to shorter walks. This metric has been widely used to characterize the importance in citation networks and web pages networks. Also, it characterizes the influence of every node in social networks based on their connections [26], [27], [28]. We propose to use it to show the indirect influence of malicious agents in cooperative control algorithms.

B. GAME THEORY CONCEPTS
A game is defined as a tuple (V, where S i is the set of available strategies for agent i, and S −i := Π j∈Ni S j . Let L i : S i × S −i → R be a loss function associated with every agent, which reflects how profitable is to make an action when other agents already chose their actions. Therefore, each agent can compute its own optimal action using the best response for its neighbors' actions s −i ∈ S −i , that is, An important trait of finite N -player non-zero sum games is that they admits a Nash equilibrium in mixed strategies [29]. Therefore, it is ensured that agents will use an optimal strategy in steady-state conditions, i.e. every agent will be able to have the minimum possible loss in a situation in which all agents try to minimize of their own.

III. OUTPUT SYNCHRONIZATION IN MAS
Consider a network of agents described by a graph G as mentioned before, governed by the discrete dynamics where x i (k) ∈ R n , u i (k) ∈ R m and y i (k) ∈ R p are the agent's i state, input and output, respectively, for the discrete time k. The objective in output synchronization in MAS is to drive all agents to an agreement space, that is, The aforementioned task describes the consensus or output synchronization problem. This task has been addressed and solved for a wide variety of systems under different assumptions. Some of the best known systems that can achieve consensus are modeled by simple and double integrators, homogeneous linear systems, passive systems, among others, even in presence of communication delays or time-varying communication graphs [30], [31], [32], [33]. An important feature of these synchronization algorithms is that they can be described by the control action where χ : R p → R m is an odd, continuous and locally Lipschitz function called coupling function. This control input makes all agents minimize driving them to the agreement space [34,Theorem 6]. We refer to J ij as the performance function of the relationship between i and j from agent i's perspective. To ensure the convergence of these algorithms, the communication graph G must be strongly connected [35]. The strongly connectivity condition states that every node is reachable from every other node in the graph, ensuring that all agents will synchronize using only local information. Although the parameters of this family of control algorithms affect how the agents synchronize, in future sections we assume that the control strategy is properly designed, i.e., the chosen χ(·) guarantees synchronization for all agents in the absence of an attack.

IV. THREAT MODEL
We describe the attack with a subset of agents V A V, called attacked nodes, that do not follow the protocol described by the control algorithm. These attacked nodes are able to modify either their input or directly their output such that the function J ij is not minimized for all v i ∈ V A . To this end, the attacker has access to all information from the attacked nodes including the control input calculated by the controller u i , its state x i , and output y i . Therefore, when the unattacked nodes try to synchronize with the attacked nodes, they desynchronize, making the whole system diverge. To design our attack mitigation strategy in a consensus system, we study a scenario where we can guarantee that an agreement can be reached. Therefore, we assume that there are some limitations on the capabilities of the attacker given in the following assumption: Assumption 1 guarantees that, if a mitigation strategy can isolate the attacked nodes, ignoring them does not affect the convergence of the synchronization algorithm. In other words, making this assumption means that the synchronization state can be reached once the attacked nodes have been ignored. Without this assumption, the reconfiguration strategy still isolates the attacked nodes, but no consensus algorithm will synchronize all the unattacked nodes in network. Some sufficient conditions to ensure Assumption 1 are: • the number of attacked nodes is less than the κconnectivity of the graph; or • the number of neighbors for every unattacked node is greater than the number of attacked nodes. Even though the first condition can be less conservative, the second one is useful in scenarios where the whole communication graph is unknown, as in distributed control scenarios or large scale systems where the κ-connectivity can be difficult to compute.

V. GAME THEORY-BASED RECONFIGURATION STRATEGY
In schemes where the attacker manipulates a subset of nodes to perturb the network, a straight-forward strategy that mitigates the impact of the attack is to reconfigurate the network such that unattacked nodes attenuate the attacked nodes' actions to be able to synchronize, as illustrated in Fig. 1. Therefore, we propose a decision-making process based on game theory as a mechanism to reconfigure the weights over the network during the synchronization process to mitigate an attack. Each agent i uses the information available from the synchronization strategy to define a game with its neighbors through a loss function L i (s j , s −i ) as defined in Section II. If the loss function is determined by acting in a way that function J ij in Equation (3) is minimized, then agent i's decision-  making will be such that it will prefer to interact with those neighbors that will allow it to reach a synchronization state. This interaction is quantified by the weighting element a ij in the control action presented in Equation (2). Now, since agents only have partial information of the complete multiagent system, they cannot anticipate their neighbor's performance J ij without losing the distributed property of the synchronization protocol. Hence, we introduce a prediction strategy that allows every agent in the network to estimate the behavior of its neighbors toward minimizing J ij without having full information of the system. Fig. 2 shows a block diagram of the proposed reconfiguration strategy. At each time step, an individual anticipates the behavior of its neighbors and modifies its interaction patterns by adjusting the weights in the control action a ij . Following, we discuss both the prediction strategy and the loss function.

A. TRAJECTORY ANTICIPATION
ij be the set that contains historical information between nodes i and j at time k defined as This set contains all the previous information that agent i has from agent j characterized by its performance J ij . Agent i assumes that its neighbor j behaves in a way that its performance can be described by some function g(·) as: ij are the parameters of the function g at time k. This function allows agent i to predict the behavior of its neighbor j at the next step. Parameters θ (k) ij are calculated using the information contained in I (k) ij , and depends on the chosen prediction function g(·). These parameters can be estimated at each time step k by any online estimation method, ranging from a linear regression to a neural network that uses transfer learning concepts. Given the parameters θ (k) ij , at time step k, every agent has access to an estimation of the behavior of all its neighbors for a prediction horizon of m steps using the recursive relation of the form Parameter m depends on the dynamics of the agents in which the reconfiguration algorithm is being applied and can be tuned.

B. NETWORK RECONFIGURATION
The reconfiguration strategy allows each unattacked agent to change its neighbor's weights to mitigate the effect of an attacker's actions. With the information provided by the estimation in (4), that represents in some way the cost to synchronize with each neighbor, the agent can effectively choose the network's weights used to synchronize. In a game-theoretic framework, we can describe the available pure strategies for every agent as its neighbors, that is, S i = N i . Therefore, every agent calculates the loss function L i (s j , s −i ) according to their neighbors behavior to determine which neighbor is preferable to interact with. The performance of agent i when it only chooses agent j, i.e. uses strategy s j ∈ S i , can be computed as where s −i are the strategies used by its neighbors, H is the prediction horizon of the system performance, and ξ ∈ [0, 1] is a discount factor that gives more relevance to future outcomes. Given that mixed strategies α i ∈ P(S i ) are allowed, the expected loss is defined as where α i = [α ij ] for all j ∈ N i is the mixed strategy used by the agent i, and α ij is the probability of agent i to choose strategy s j . Hence, j∈Ni α ij = 1.
The proposed loss function in Equation (5) drive agents to only choose their neighbors with minimum cost L (k) i (s j , s −i ), completely discarding the other ones. This situation could lead agents to take premature decisions, leading to unnecessary disconnections between agents and, consequently, a deterioration in the system's performance. Thus, we propose to include a randomization mechanism in the loss function through the Shannon entropy. With this incorporation, the agents not only are forced to minimize the synchronization performance loss, but also they try to maintain the most possible connections. The loss function can be redefined aŝ being λ(k) a regularization term that can change at every iteration k. This regularization term ensures that the performance loss and the entropy loss are comparable. We use a decreasing regularization term such that the performance loss gains importance over time. With a proper definition of the loss function, agents can choose the most profitable connections, i.e. the less expensive strategy, using the best response and define the weights over network as a * ij (k) = α * ij (k)d i , where d i is the degree of node i. Note that these modified weights preserve the node degrees d i through time. The protocol defined in Equation (7) can be seen as a perturbed best response dynamics [36]. In them, the players of a game try to choose their best mixed strategy while perceiving a noisy version of their pay-off. In our case, by introducing noise to the perceived pay-off L with the Shannon entropy we force them to maintain more connections. Thus, each agent compute its input as The following theorem shows that, for each agent, we can find a closed form solution to this problem at each step k.
Theorem 1. The solution to the optimization problem in Equation (7) is Proof: The Lagrangian of the constrained problem in (7) is given by .
Corollary 1.1. When λ(k) = 0 for a given k, the problem is formulated as whose solution corresponds to choosing any convex combination of neighbors that have minimum individual cost L It means that if one neighbor has the lowest L ij , then this neighbor is selected. When two or more neighbors have the lowest L ij , then the result of the optimization problem can be a convex combination of those neighbors. The whole reconfiguration strategy to compute the weights in the control action for synchronization is summarized in the Algorithm 1. Interestingly, the result in Theorem 1 has a similar form to the update rule computed by an agent in resilient distributed hypothesis testing, where each agent estimates a local belief to its neighbor's true behavior [37], [38]. Following, we provide an interpretation of these results and analyze how parameter λ(k) should be selected.

C. BEHAVIOR OF THE RECONFIGURATION ALGORITHM
Theorem 1 shows that playing the reconfiguration game results in weights that are computed from a soft-min function. This kind of function has been widely used in areas ranging from statistical mechanics to machine learning and hypothesis testing to map arbitrary values into probability distributions, providing a smooth version of argmin operator [39]. The main parameter of the soft-min function, in our VOLUME 4, 2016 Algorithm 1: Agent i's reconfiguration strategy for resilient distributed synchronization.
At time k foreach j ∈ N i do Read neighbor's output y j (k) Update neighbor's performance J ij (k) Update information set I case λ(k), can be seen as a threshold defining when extreme numbers are discarded. Therefore, if the difference between a number and the real minimum is comparable to λ(k), then they are treated as similar numbers. On the other hand, if the difference is larger than λ(k), then those options are discarded. For our mitigation strategy, this point of view restates the importance of the regularization term. If λ(k) = 0, then the agents will discard many of the neighbors, preserving only the connections with agents with minimum loss, as noted in Corollary 1.1, making the reconfiguration strategy inconsistent. In addition, analyzing our strategy as a softmin function provides us some information on how to choose λ(k). First, note that L ij and J ij satisfy Knowing this, the choice of λ(k) must ensure that none of the nodes should be discarded prematurely because attacked nodes do not necessarily diverge at the initial time steps. Therefore, the initial condition of the parameter is chosen to be large enough to make all the neighbors comparable, i.e, with J + ij (0) being the maximum of possible value thatĴ ij (0) can take given some initial conditions. Additionally, λ(k) must be decreasing with respect to k, in a way that the proposed algorithm is able to discard attacked nodes. However, if it decreases too fast, some unattacked nodes could be discarded. To avoid this situation, we can use the fact that the performance cost of every unattacked node will decrease. Then, there exists a constant β ∈ (0, 1) such that This constant denotes the slowest decreasing rate for all unattacked nodes. Thus, the evolution of λ can be set as This condition leads to state that, with a proper choice of λ(k), the mitigation strategy described in Equation (7) will discard the attacked nodes. At some time step k * , the loss L ij of every attacked node will, at least, remain the same while the loss of the unattacked nodes will decrease. This affirmation not only guarantees that the proposed reconfiguration algorithm will remove the attacked nodes. Also, it states that the output synchronization algorithm still works without the presence of an attack because the loss of every node will decrease and their connections will be preserved.
It is important to mention that this is a heuristic algorithm, and that convergence to synchronization is not guaranteed, unless all unatacked nodes are disconnected. Through simulation study cases we show that this algorithm works in a wide variety of synchronization environments in which agents are under attack.

VI. STUDY CASES
We implemented the strategy shown in Equation (7) in three different systems to show the performance of the proposed mitigation strategy: a discrete consensus, consensus filters for sensor networks, and formation control in differentialdrive robots in a realistic scenario. In these study cases, we conducted a weighted linear regression at each time step k to find the parameters of the prediction function g(·), where γ is a discount factor that gives priority to latest data. We chose linear regression because of its simplicity of implementation and it has an explicit solution through the least square minimization problem, facilitating an online estimation. Additional parameters for the following simulations are ξ = 0.75, H = 5 and γ = 0.75. For each multi-agent system, we conducted multiple simulations to observe the performance of the proposed algorithm for different communication graphs, number of attacked nodes, and initial conditions. For comparison purposes, we showed the system's performance in four different scenarios for each study case. The first two scenarios are in the absence of an attack. The first one does not have the reconfiguration strategy to show the performance of the system in normal operation conditions. The second one implements the mitigation strategy to observe how it affects the aforementioned expected performance. The last two scenarios are in the presence of an attack. First, when the mitigation strategy is not implemented, and then when it is implemented.  (14) and (15) for a set of 1000 simulations are presented (mean: solid line, and standard deviation: light-colored shade).

A. DISCRETE CONSENSUS
In the first set of simulations, we implemented the proposed reconfiguration strategy for discrete consensus, which is a well-studied case of output synchronization. The dynamics of this system is described as Additionally, we chose the attack model as a bias attack in the network, i.e., attacked nodes will follow the protocol defined in Equation (13) but they add an additional constant to desynchronize the network. This attack model can be defined as where τ i is the desyncronization bias. For the initial communication graph, we used a 7-node graph generated as a binomial random graph with probability of connections p = 0.5 and weights a ij (0) ∈ [0.8, 1.2]. The regularization parameter is set to follow the rule λ(k) = 7(0.95) k following the recommendations in Section V-C. To measure the performance of the system, we computed the mean of the average difference between every pair of unattacked agents. This can described as where D is the set of all possible pairs of unattacked nodes, and P (k) ∈ R is the performance index at the time step k. To make the performance comparable for all simulations disregarding initial conditions, we used a normalized performance indexP (k) = P (k) P (0) .
We run a set of 1000 simulations to verify the algorithm performance under different communication graphs and initial conditions. Fig. 3 shows the behavior of y i andP under four different scenarios. Figs. 3(a) and 3(e) show that the performance decreases to 0, meaning that agents achieve consensus in absence of an attack and with no reconfiguration. Similarly, in Figs. 3(b) and 3(f),P (k) decreases as time increases, verifying that reconfiguration algorithm does not affect the synchronization algorithm. On the other hand, Figs. 3(c) and 3(g) show how the desynchronization bias does not allow the agents to synchronize. Conversely, they keep separated when they try to get closer to the attacked agent. Finally, Figs. 3(d) and 3(h) show that the inter-agent distance decreases even in presence of an attacked node, showing the proper functioning of the proposed algorithm for attack mitigation. This can be verified in Fig. 3(h) where unattacked agents ignore the malicious agent and synchronize.
In addition to the previos set of simulations, we compare the performance of the proposed algorithm with other strategies for resilient reconfiguration in this simple yet illustrative study case. We implemented the moving target defense (MTD) strategy presented in [23] and the weightedmean subsequence-reduced (W-MSR) algorithm presented in [22]. In MTD, at each time step each agent randomly VOLUME 4, 2016 (14) and (15) for a set of 1000 simulations for a discrete consensus system (a)(e) without reconfiguration, (b)(f) with our proposed reconfiguration introduced in Algorithm 1, (c)(g) with moving target defense (MTD) reconfiguration [23], and (d)(h) with weighted-mean subsequence-reduced (W-MSR) reconfiguration [22]   connects and disconnects communication with its neighbors according with a probability p d . Here, we set p d = 0.5. In W-MSR, at each time step an agent disconnects those neighbors who have a state variable that are strictly larger or lower than its own, as long as there are at most F neighbors that satisfy this. Here, we set F = 1. We run an additional set of 1000 simulations under different communication graphs and initial conditions and tested our proposed algorithm and the competing algorithms. Fig. 4 shows the performanceP in Equation (15) for the different reconfiguration strategies, and Fig. 5 shows the trajectories of the agents for a single simulation. Note that our algorithm outperforms the other algorithms in this scenario. While random disconnections in MTD lead to a reduced effect of the attacker in the network, the attack still affects the synchronization between unattacked nodes. Likewise, the W-MSR algorithm guarantees convergence when it is applied to a special family of network called robust networks [22]. However, since the proposed networks are not necessarily considered as robust, there exist cases when this algorithm disconnects the network, perturbing the synchronization process even in absence of an attack. On the other hand, our algorithm based on bestresponse games conducts a decision-making based on the prediction of a general performance criterion and does not try to not reduce arbitrarily the connectivity of the network, allowing for a resilient synchronization in a wide variety of scenarios.

B. CONSENSUS FILTERS FOR SENSOR NETWORKS
After showing the performance in a standard synchronization multi-agent system, we tested our reconfiguration strategy in noisy environments. For this purpose, we implemented our strategy in the sensor fusion network scheme developed in [7]. All the sensor agents compare their measurements with their neighbors driving them to the consensus agreement space. This makes that all agents' states preserve the same mean of the measurements, but reducing the variance of them. The dynamics of these sensor agents are described as where J i := N i ∪ {i}, a ii = 1 for all i, and u i is the measurement of the i sensor. With the dynamics described in Equation (16), the agents will merge their measurements reducing the variance along the network. In this simulation, all agents are sampling a standard normal random variable. The attacked nodes directly modify their output adding a sigmoid function to disturb the network, i.e., This attack introduces not only a bias but also noisier measurements. In this case, the attack disperses the agents, making all states differ from the real measure. This misinformation can lead the system to make wrong decisions because the observed measurement is not correct and, in most cases, different for all nodes.
To check the performance in large-scale systems, we implemented the system described in Equation (16) with a 100node Erdos-Renyi unweighted random graph as an intial communication graph and, to test it in the worst possible scenario, we attacked the κ − 1 most important nodes according to Katz centrality, as defined in Equation (1), to preserve Assumption 1. The regularization parameter is defined by the rule λ(k) = 7(0.9) k + 2. The performance of the system is evaluated by analyzing the average of the unattacked agents, i.e., For a set of 150 simulations, Fig. 6 shows the behavior of y i , and P from Equation (17). According to [7], unattacked VOLUME 4, 2016 nodes will preserve the mean of the original measurements as shown in Figs. 6(a) and 6(e). The reconfiguration strategy does not perturb the behavior of the network in absence of an attack, as presented in Figs. 6(b) and 6(f), which is a desirable property in this type of cyber-security strategies. When the described attack occurs, the average of the agents deviates from the expected mean, leading to misinformation in the sensor network, as shown in Figs. 6(c) and 6(g). When the mitigation strategy is implemented, the nodes along the network can ignore the attacked nodes, preserving again the mean of the sampled variable u i as presented in Figs. 6(d) and 6(h).

C. FORMATION CONTROL IN DIFFERENTIAL DRIVE ROBOTS
Finally, we tested our reconfiguration strategy in a realistic environment involving the control of a set of differentialdrive robots. We implemented five pioneer-like differentialdrive robots in an environment created in CoppeliaSim from Coppelia Robotics on the Robot Operating System (ROS). This environment recreates the physics of the robots and allows for the application of a low-level control policy to drive the speed of each wheel and a high-level control to define each robot's decision-making. Fig. 7 shows a snapshot of the differential drive robots in the simulation environment. The main goal of the set of robots is to reach a formation of a regular pentagon and to avoid those whose control policy has been attacked. The decision-making of each robot is defined as shown in Fig. 8  frame along with the rotation angle θ within this frame and the robot's. Robot i is able to read its neighbors' position y j (k), for all j ∈ N i , according to a communication topology. In this set of simulations, we assumed that they communicate following fully-connected network. Given its own position and its neighbors', robot i estimates the reference position y ref i that it should follow at the next time step. This reference is computed using a protocol based on a consensus scheme and the proposed reconfiguration strategy in order to reach the desired formation and to avoid attacked robots. The formation control can be achieved using the Laplacian-based control: where ∆ ij is the expected distance between agents that defines the pentagonal formation and a ij is defined by Algorithm 1.
Given the reference position y ref i , robot i computes the required change in y i , defined asẇ i , to reach the reference position. This is achieved by a data-driven controller, known as MFAC [40], which conducts an data-based online estimation of the nonlinear dynamics of the robot in the environment. To associate this change in the components of the robot in the global frame with the angular velocities of each of robot's wheelsφ i,1 andφ i,2 , a set of kinematic equations that consider a rotation map and the rolling and non-sliding control are applied. A proportional-integral-derivative (PID) controller defines the policy that drives each wheel given the angular velocitiesφ i,1 andφ i,2 . Details of the kinematic equation of the robot and the MFAC control are given in the appendix.
To disturb this system, one of the agents drives the whole formation to a new final position. This agent defines its reference position as where r a ∈ R 3 is the position of the attacked robot, moving all the formation to a new synchronization point as shown in Fig. 9(c). Here, by perturbing only one node, all agents move to an unexpected region making this type of attacks a real hazard in scenarios that involve autonomous vehicles [23]. With this in mind, we measured the performance of the system by comparing how much every agent deviates from the ideal case. This case is when there is neither an attack nor a reconfiguration strategy, as for example in Fig. 9(a). This can be described mathematically as, where y r i is the final position of agent i in the ideal case. As in discrete consensus simulations, we used a normalized version of the performance index to make all simulations comparable, that is,P (k) as described in Equation (15). We implemented the mitigation strategy with ξ = 0.75, and  λ(k) = 3000(0.99) k as regularization parameter. Furthermore, the prediction horizon is set to H = 5 to capture the nonlinear behavior of the robotic system. We tested our algorithm using 10 simulations to verify its performance under different initial conditions in the position of the robots.
Figs. 9(a) and 9(e) show how the indicator of the system decreases to 0 as expected by the performance definition in Equation (18) since there are no attacks. Figs. 9(b) and 9(f) show the case when our strategy is implemented in absence of a malicious attack. The performance index decreases indicating that the reconfiguration strategy slightly perturbs the agents' final positions when is implemented. However, it can be observed that the performance decreases at a slower rate than the ideal case due to changes in the network's weights. On the other hand, the attacker's deviation makes all agents move away from the expected position y r i as presented in Figs. 9(c) and 9(g). However, the proposed reconfiguration strategy clearly reduces the effect of the attacker on the network, bounding the dissimilarity between the original formation and the one after reconfiguration, as shown in Figs. 9(d) and 9(h).

VII. CONCLUSIONS
We introduced a heuristic algorithm based on a best-response game for attack mitigation in which every unattacked agent in the network dynamically changes its weights to achieve a synchronization state. The mechanism used to choose the weights is based on the game theory concept of Nash equilibrium, where all agents try to minimize a cost related to the performance of their neighbors. We presented some simulations of the proposed mitigation strategy implemented on several systems that we considered provided a wide variety of scenarios. In these simulations, we showed the performance of the reconfiguration algorithm in different scenarios, verifying the capabilities of the algorithm to mitigate different kinds of attacks.
As future work, we propose to implement and evaluate the suggested strategy in a real physical systems. Since the purpose of these secure strategies is to prevent and mitigate malfunctions in cyber-physical systems, it is important to analyze the performance of our strategy taking into consideration implementation aspects that also involve the presence of delays, faults, and communication noise [41], [42], [43]. Also, other types of threat models, regression models based on online estimations, and synchronization challenges such as containment control in distributed systems can be studied [44], [45]. .

A. KINEMATICS OF THE ROBOT
A differential robot is represented by w = [x, z, θ] T , which is the 2-dimensional coordinates of the middle point between its wheels in the global reference frame (x, z) along with the rotation angle θ within this frame and the robot's. The rotation matrix R(θ), which is given by maps the point w to the robot's frame as w R = R(θ)w = [x R , z R , θ R ] T . The subindex R refers to the robot's reference frame.
To associate the velocity components of the robot in the global frame to the velocities of each of its wheels, the rolling and non-sliding equations associated with the physical structure of the robot are used. The rolling equation that describes the motion of the robot for each of the wheels based on the physical structure of the wheels in the robot chassis are defined as: x R sin (α + β) −ż R cos (α + β) −θ R l cos (β) = qφ, where 2l represents de distance between wheels, q is the radius of the particular wheel, andφ is its angular speed. Constants α and β are constants to each wheel: α = −π/2 and β = π for the right wheel, and α = π/2 y β = 0 for the left one.
The non-sliding equation limits, under the assumption of unidirectional motion of the wheels, the direction in which the robot does not move, and is defined aṡ x R cos (α + β) +ż R sin (α + β) +θ R l sin (β) = 0.
By posing the associated system of equations and clearing the angular velocities, the following matrix system is obtained to calculate the angular velocities of each wheel from the desired linear velocities, where subindex 1 refers to the right wheel and subindex 2 refers to the left wheel [46]: Model free adaptive control (MFAC) is a data-based type of control used to determine the rate of change of the position of robot i that is required to reach the reference point y ref i . This strategy uses an iterative estimation of the partial pseudoderivative matrix φ i [40], which estimates the relationship between the output and the input of a nonlinear system. In our case, φ i estimates the relationship between robot i position y i and the velocityẇ i that is required by the robot to reach the reference position y ref i . The update rule for φ i is given by where φ i (1) = I, η = 1 and µ = 1. This rule updates matrix φ i based on the changes in the input and the output ∆y i (k) = y i (k) − y i (k − 1) and ∆ẇ i (k) =ẇ i (k) −ẇ i (k − 1).
Given a reference signal y ref i , based on the calculation of the pseudo-partial derivative φ i , the control signalẇ i is given by the expressioṅ where ν = 1 and ρ = 2.7.