Distributed Detection and Mitigation of Biasing Attacks over Multi-Agent Networks

This paper proposes a distributed attack detection and mitigation technique based on distributed estimation over a multi-agent network, where the agents take partial system measurements susceptible to (possible) biasing attacks. In particular, we assume that the system is not locally observable via the measurements in the direct neighborhood of any agent. First, for performance analysis in the attack-free case, we show that the proposed distributed estimation is unbiased with bounded mean-square deviation in steady-state. Then, we propose a residual-based strategy to locally detect possible attacks at agents. In contrast to the deterministic thresholds in the literature assuming an upper bound on the noise support, we define the thresholds on the residuals in a probabilistic sense. After detecting and isolating the attacked agent, a system-digraph-based mitigation strategy is proposed to replace the attacked measurement with a new observationally-equivalent one to recover potential observability loss. We adopt a graph-theoretic method to classify the agents based on their measurements, to distinguish between the agents recovering the system rank-deficiency and the ones recovering output-connectivity of the system digraph. The attack detection/mitigation strategy is specifically described for each type, which is of polynomial-order complexity for large-scale applications. Illustrative simulations support our theoretical results.


INTRODUCTION
D ATA (or measurements) regarding many real-world systems, such as wireless sensor networks, multi-agent robotic systems, block-chain and cloud-computing, smart energy networks, are naturally distributed over large geographical regions [1], [2]. Collecting all these to a central coordinator (or a fusion center) for the purposes of processing and learning is tedious and impractical in many applications. Distributed learning or inference is thus typically preferred, due to the fact that it does not require longrange communication to a central unit. The corresponding distributed strategies are practically feasible as they rely on local data processing and local communication only among the neighboring agents. However, such decentralized strategies are vulnerable to malicious attacks. In this paper, we consider distributed detection and mitigation of biasing attacks at sensors/agents performing distributed estimation over a large-scale dynamical system. Potential applications include secure distributed estimation over Cyber-Physical-Systems (CPS) [3]- [9], Internet-of-Things (IoT) [10]- [12], smart cities [13], social networks [14]- [16], and power-grid Doostmohammadian  monitoring systems [2], [17]- [25] among others. In distributed estimation (or filtering) applications [26]- [28] a multi-agent network is referred to a group of agents with sensing, data-processing, and communication capabilities, which take (noisy) output or measurements of the dynamical system, share their information over a network, and process the received data locally to track the system state. In case of erroneous or biased data [29], [30], the distributed estimation performance is significantly degraded if the biased measurements are necessary for observability. Recall that observability refers to the possibility of inferring the (entire) states of the dynamical system via tracking outputs/measurements of a subset of states over a finite time. This is more challenging in single time-scale estimation with only one step of data-fusion between every two consecutive time-steps of system dynamics, and with no local observability (i.e., the system is not observable in the neighborhood of any agent) [26], [27], [31]- [35]. This differs from double time-scale estimation where all necessary information for observability is directly communicated to every agent from its neighbors. This requires considerably more communication traffic and information exchange over the network. This implies that the biased (attacked) measurement affects the residual (defined as the deviation of the estimated/expected output from the original system output [36]) at more agents, making it harder to locally isolate the faulty sensor. Such additive bias could be, for example, due to false-data injection attacks [37]. The general idea in this work is to locally detect and isolate such attacks and, further, reconfigure the multiagent network using substitute measurements to recover (potential) loss of observability.
The distributed estimator in this paper performs consensus (on the received data) at the same time-scale of the arXiv:2109.09329v1 [eess.SY] 20 Sep 2021 underlying system (single time-scale), see e.g., [38], [39] for details. We use structured systems theory [39]- [41]) to guarantee generic or structural observability. This helps to partition the system outputs (fed to the agents) into certain observationally-equivalent classes [42]. This gives the set of necessary agents for estimation (whose removal makes the system unobservable) and the set of redundant agents (whose removal results in no observability loss). Subsequently, different strategies are used to substitute the faulty sensor and design inter-agent communications. We propose our attack detection and mitigation strategy based on this specific agent classification. In particular, we show that isolation of the attacks related to system rank-deficiency is more challenging and requires certain constrained gain design.
Recall that system rank refers to the rank of the associated matrix to the linear system of differential equations (in the state-space representation), see Section 2.1 for more details.
Comparison with related literature: this work develops a joint distributed estimation and attack detection/isolation technique, and extends the prior works on resilient distributed estimation subject to unreliable sensor measurements [43], [44] and adversarial attacks [32], [45]- [52]. These literature do not detect/isolate the attack, but estimate the system in the presence of (specific) attacks with bounded (steady-state) error, while making simplifying assumptions, e.g., a noise-free model. Our work extends [32], [43]- [52] by further considering distributed/localized techniques to locate the attacked sensor. Further, this work differs from many works on distributed estimation in the literature by relaxing the observability assumption; for example, [26], [27], [31]- [35] assume local observability at some (or all) agents. In contrast, and similar to [28], [53], [54], our work makes no such restrictive assumption. However, [28], [53], [54] perform many iterations of data-fusion (consensus) between two consecutive system steps (double time-scale estimation), requiring much faster data-processing/communication rate.
In the context of adversarial attacks, most observerbased detection scenarios assume system and/or measurement noise with bounded support, i.e., they consider an upper bound on the noise variable [55]- [59]. In this paper, we make no such assumption; instead, the noise is assumed to be of infinite support (i.e., it can take any arbitrarily large value with bounded second-order moment). Therefore, we propose probabilistic attack-detection thresholds, in contrast to the deterministic threshold design (or flag value) in observer-based detection methods [55]- [59]. In another line of research [60]- [67], distributed attack detection without observer/estimator design is considered. These works consider a multi-agent network aiming to detect (typically Byzantine) attack in a sensed signal in a distributed way, with no estimation purpose (due to unknown system model). For example, [7] uses innovation variance to detect attacks (component malfunctions) in linear-quadratic-Gaussian (LQG) CPS models. However, our main goal is to detect the attacks (in form of biasing anomalies changing the true output values [30]) deteriorating distributed estimation performance, and, further, to provide a mitigation strategy to restore observability (more precisely, distributed observability [68]). In this regard, this paper performs simultaneous distributed estimation and attack-detection, which makes it different from [7], [60]- [67] performing only detection.
Of relevance are also watermarking strategies [69], [70], that inject a known input signal (watermark) into the system and track this watermark in the outputs using Chi-square testing (χ 2 -detector). Such input injection is not possible for tracking autonomous systems, and thus, the physical watermarking is impractical in such cases. The distributed strategy in this work is not limited to full-rank LTI systems, in contrast to distributed estimators in [29], [33]- [35], [71] over strongly-connected (SC) sensor-networks. Further, unlike the static parameter estimation in [72] and noiseless centralized attack-detection/estimation in [31], this work is based on distributed estimation of noise-corrupted linear systems. Another relevant topic is compressive sensing [73]- [78] to translate the data into a compressed dimension, share and combine the data, reconstruct it to the full dimension, and perform diffusion-based [78] or least mean square (LMS) update [75]- [77] to estimate the original signal. Although the compressed transmit of data is applicable in our work (to reduce the communication burden), distributed dynamic observability makes our work different from [25], [75]- [78] based on static observability irrespective of the dynamic system model. Recall that this is referred to as the Static Linear State-Space (SLS) model in detection literature [36] and differs from our solution considering Linear Dynamical State-Space (LDS) model 1 . Similarly this work differs from centralized estimation in [73], [74] with certain assumptions on the sparsity of the initial states [73], [74] or system rank [73]. Autoencoder-based learning is used in some works [80]- [83] to distinguish (classify) faulty/attacked data from non-attacked measurement data. In smart-grid applications, the PMU measurements are used to train the detector via either supervised learning [80], [81] or unsupervised learning [82]. No dynamics is considered in these works (SLS model), contrasting our (distributed) observability-based LDS model. Further, [80]- [82] only perform detection with no aim of estimation in the absence of attacks, while some works (see references in [83]) only perform learning-based estimation with no possibility of detection. Recall that noise (in system dynamics and/or output) plays a key role in the LDS detection. As mentioned before, the assumption on the noise support (finite or infinite) and its value in the finitecase affects the performance of the detection mechanism [55]- [59]. Similarly, noise in the output data affects the SLS detection performance, e.g., in power-system applications [25], [75]- [78]. See more details along with a review of centralized physics-based detection mechanisms in [36].
Main contributions: (i) Our observer-based detection strategy is localized and distributed over the multi-agent network with no local observability assumption at any agent, but global observability at the group of agents. This is key in large-scale, as it enables each agent to detect a (possible) attack on its received output with no central coordination, in contrast 1. Using the dynamic model of the system (LDS case), fewer outputs are needed to reconstruct the full state of the system (dynamic observability), while in the static or SLS case (with no information of system dynamics) in general more outputs (as many as system states) are needed. Having fewer outputs in the SLS case results in under-determined system of linear equations (unobservability), which mandates substitute recovering solutions such as compressive-sensing or auto-encoder neural networks [79]. A compressive-sensing-based example for the smart-grid application is given in [25], which requires no rank condition on the SLS model. time index x k , y k column-vector of states, measurements ν ν ν k , ζ ζ ζ k zero-mean system and measurement noise τ τ τ k attack vector at time k A, C system and measurement matrix E, R system and measurement noise covariance c j measurement matrix (column-vector) at agent j α, β, γ types of agents G A system digraph associated with system matrix A G N , Gα, G β communication network of agents Nα(i), N β (i) set of neighbors of agent i over networks Gα, G β N (·, ·) Gaussian distribution C, S p contraction and parent SCC in the system graph W stochastic fusion-matrix associated to G β U adjacency matrix of Gα K gain matrix (with K i as i-th diagonal-block ) to centralized detection scenarios. (ii) Using certain agent classification based on system-rank, we develop detection and attack isolation strategies which are specific to the measurement types based on the system dynamics (LDS model) (see Section 2.2 for detailed explanation). (iii) The noise is considered over an infinite range with no constraint/bound on its support, which is more realistic for real-world applications (see Remark 1). In this sense, our attack detection and mitigation is categorized as probabilistic (vs. deterministic) thresholding. (iv) In order to prevent repetitive attacks at the same agent by the adversary, we consider an attack mitigation strategy to replace the biased measurement with an observationally-equivalent one (borrowing results from [42], [84]). We emphasize that the proposed algorithms for threshold design, agent classification, and mitigation via observational equivalency are of polynomial-order complexity. Notation: Throughout this paper, scalar and (column) vector variables are respectively represented by lower-case and bold lower-case letters. Further, capital letters represent matrices. The induced 2-norm of the matrix A is defined as A 2 = √ λ n where λ n = ρ(A A) and ρ(·) denotes the spectral radius of matrix. Further, | · | denotes the Euclidean norm. Table 1 summarizes the notation in this paper.

Linear Dynamical System
Following the discussions in Section 1, we consider noisecorrupted linear discrete-time systems (LDS model [36]) as, with x k ∈ R n as the column-vector of states at time k, A as the system matrix, and ν ν ν k ∼ N (0, E) as the system noise vector. Throughout the paper, the system-rank refers to the rank of the system matrix A. Consider a group of N agents with scalar outputs given by y i k = c i x k + ζ ζ ζ i k + τ i k and the vector form as, with y k ∈ R N as the column-vector of state measurements (or system outputs) y k = (y 1 k , . . . , y N k ) , ζ ζ ζ k = (ζ 1 k , . . . , ζ N k ) ∼ N (0, R) as the measurement noise vector, and τ τ τ k = (τ 1 k , . . . , τ N k ) as the column-vector of biasing attack at the agents. We assume arbitrary attack τ τ τ k by the adversary, e.g., both fixed stationary attack and nonstationary attacks are considered for simulation (Section 5). Further, the measurement matrix C = [c 1 ; . . . ; c N ] is the column concatenation of row-vectors c i associated with agent i (with ";" as column concatenation). Standard assumptions on Gaussianity and independence of noise terms are considered. For example, it is typical to assume that the sensor measurements are independent, making the measurement noise covariance matrix R diagonal.

Remark 1.
Several papers in the literature (e.g., [55]- [59]) assume constrained noise |ν ν ν k | < δ and/or |ζ ζ ζ k | < δ, where the upper bound δ on the noise support sets the deterministic thresholds for attack detection. For example, in [56] the deterministic threshold at sensor i is defined as D i = O i 2 e i k 2 + 2δ with O i 2 and e i k 2 as the 2-norm of the observability Grammian and the state-estimation error, respectively. In contrast, we make no such finite support assumption (loosely speaking, δ → ∞), while it is standard to assume that the second moments of the noise terms are finite, i.e., E(ν ν ν k ν ν ν k ) < ∞ and E(ζ ζ ζ k ζ ζ ζ k ) < ∞. Assuming unbounded δ, the deterministic threshold, for example D i in [56], also goes unbounded (→ ∞), and thus, no attack can be detected. Similar arguments hold for [55], [57]- [59].

Agent Classification based on Structural Analysis
The notion of observability used throughout this paper is structural [40], [86], [88] and the theory is build on this notion. It is known that the rank deficiency of matrix A and strong-connectivity of system digraph G A affect its structural observability properties, and further, its estimation performance. In this direction, using structured systems theory and generic analysis [40], [88], we propose specific sensor/agent classification based on the structure (zerononzero pattern) of the system matrix A and system digraph G A . Using the theory developed in [5], [42], the agents are partitioned into different classes based on their statemeasurements. We specifically show in Section 4.2 that the detection and mitigation logic differs for each class. First, we describe some relevant graph-theoretic notions. In G A , every node represents a state and every link represents a fixed non-zero entry of A (A ij implies j → i as a link from node j to node i). In G A a strongly-connected-component (SCC) is a component in which every node is connected to every other node via a path. Define a parent SCC S p l as an SCC with no out-going links to other SCCs. Further, a contraction , a ∈ C l } and | · | as the set cardinality.  Fig. 1. This figure illustrates the proposed agent/measurement classification over a simple system digraph and its associated system matrix: α-agents with outputs y 1 and y 2 from a contraction (two state nodes contracting into one state node), β-agent with output y 4 from a parent SCC (two linked state nodes with no outgoing link to other components), and a redundant γ-agent (with output y 3 ) which is neither type α nor type β. As illustrated in the zero-nonzero pattern of the system matrix (right), the contraction represents system (structural) rank-deficiency and the parent-SCC is associated with the irreducible block (with zero entries in the upper/lower non-diagonal blocks). See more details in [42].
Based on these graph components, three types of agents are defined as follows, • α-agent is an agent with measurement of a state node in a contraction C l . • β-agent is an agent with measurement of a state node in a parent SCC S p l . • γ-agent is any agent which is neither type α nor β. An example of such classification is given in Section 5. This partitioning has two advantages: (i) it allows using a different communication topology for different types of agents and simpler topology design when one or the other type of agents is not present; and (ii) it allows for the attack detection and mitigation strategy to be specifically defined for each type (see details in Section 4.2). In particular, following [85], it can be shown that any α-agent recovers the (structural) rank condition for observability, while the β-agent recovers the output-connectivity of the system digraph [86]. Therefore, both α and β-agents are necessary for observability, while removing (redundant) γ-agents has no effect on system observability. Recall that the structural properties are irrespective of the numerical values of system parameters [40]; therefore, for a structure-invariant matrix A the proposed classification is fixed and time-invariant.

Problem Statement
This paper considers a group of sensors/agents taking noisecorrupted measurements in the form (2) of a dynamical system (e.g., social network or power grid) in the form (1) represented by a system digraph G A , see Fig. 2. The agents perform distributed estimation over a network, denoted by G N = G α ∪ G β to track the state of the noisy dynamical system (1). Note that the networks G α , G β , and their union G N include all the agents of type α, β, and γ. It is assumed that an adversarial attacker aims to add an arbitrary value τ i k (at any time k) to make the measurement at (one or more) agent i biased from its original value. Since the dynamical system is not necessarily observable at any agent, the biased measurements (at α/β-agents) affect the estimation error at all agents and result in the degradation of the distributed estimation performance. The problem here is to find a strategy to detect (and isolate) such instantaneous attacks locally at each agent. In particular, we propose a probabilistic detection strategy that returns the probability of attack (at each agent),  Fig. 2. In this work, there exist two graph representations: (i) system digraph G A (see Section 2.2), representing the interactions of system states (or system nodes), and (ii) multi-agent network (denoted by G N = Gα ∪ G β ). The system digraph models a large-scale statespace system, e.g., social network, power grid, or weather system. The green arrows show the state measurements/outputs which vulnerable to (possible) adversarial attacks. The agents/sensors are classified based on their state measurements from specific components in G A (see examples in Section 5) and track the global system state locally (i.e., performing distributed estimation) via sharing information over the network G N . Attacked (biased) measurements may affect the estimation performance at all agents. The proposed algorithm in this work enables each agent to locally detect if its measurement/output is attacked or not, and further provides mitigation techniques for resilient estimation.
instead of deterministic strategies returning 0-1 (NoAttack-Attack). The next question addressed in this paper is how to recover the potential loss of observability due to removing the attacked measurement depending on its type (α, β, or γ). Such countermeasures prevent the same adversarial attack by removing the attacked agent/measurement. As explained in Section 4.2, the attacked measurement can be replaced with a new observationally-equivalent one to avoid possible repetitive attacks at the same agent.

Assumptions
(i) The pair (A, C) is observable. The pairs (A, c j ) and (A, c Nα(j) ) are not necessarily observable at any sensor j or in its neighborhood denoted by N α (j) ∪ N β (j) (see details in Section 3). This implies that the underlying system A is not necessarily observable in the neighborhood of any agent. (ii) The noise terms ν ν ν k , ζ ζ ζ k are iid Gaussian, see Remark 1. (iii) The known system matrix A is not necessarily stable, i.e., its spectral radius ρ(A) can be potentially greater than 1. In other words, this paper applies to both stable and unstable systems. (iv) The adversary can manipulate the state measurements at a subset of sensors by adding erroneous additive term τ τ τ k at any time k. For example, τ i k can be from a uniform distribution over [−l τ , l τ ] with l τ R 2 , l τ E 2 (l τ → ∞ in general) or τ i k can be a fixed value. In general, the term τ i k may be nonzero at some time-instants k (instantaneous attack) and zero at some other times.

DISTRIBUTED ESTIMATION UNDER POSSIBLE MEASUREMENT ATTACKS
In this section, we propose a consensus-based distributed estimation (filtering) protocol over the multi-agent network. The proposed protocol performs one iteration of information sharing and consensus between every two consecutive steps of system dynamics as follows: where y j k is the measurement of agent j at time k that could be attack-corrupted (or biased), N β (i) and N α (i) are the neighborhood of agent i, respectively, over network G β and G α , K i is the local feedback gain (or the observer gain) matrix at agent i, and x i k|k−1 and x i k|k are the (columnvector of) estimates of system state x k at agent i given the measurements, respectively, up to time k − 1 and k. In fact, x i k|k−1 is the a-priori estimate (or prediction) and x i k|k is the posteriori estimate after measurement-update at time-step k.

Remark 3.
In this work, the combination of the following two graphs forms the multi-agent network: (i) G β over which agents share the estimates x j k−1|k−1 , and (ii) G α over which agents share their measurements y j k . Define matrices W and U as the associated matrices to the graphs G β and G α , respectively. The matrix U = {U ij } is the 0-1 adjacency matrix of G α , with U ij = 1 associated to the link j → i in G α from α-agent j to every agent i. The non-zero entries of W = {W ij } take values in the range 0 < W ij ≤ 1 associated to the link j → i in G β .
Matrix W is row-stochastic to ensure consensus on apriori estimates, i.e., n j=1 W ij = j∈N β (i) W ij = 1 for all i, j. Such a matrix W (and the graph G β ) can be formed via distributed algorithms in [89]. The structure of G β and G α (and the associated matrices) need to be designed properly for bounded steady-state estimation error, see Section 3.1.

Remark 4.
The proposed protocol (3)-(4) is a single time-scale distributed estimator, where the estimation is performed at the same time-scale of the system dynamics. This is in contrast to the double time-scale protocols [28], [53], [54], which require much faster estimation and communication rate than the sampling rate of the system dynamics, and, therefore, demand more costly communication and processing equipment. However, the observability assumption in [28], [53], [54] is similar to Assumption (ii), which makes such scenarios suitable for large-scale applications as the proposed protocol where η η η k collects the noise terms, , and D C (U ⊗ 1 n ) • (1 N ⊗ C ) with "•" and "⊗", respectively, as the entrywise (Hadamard) and Kronecker product.
Proof. The error at each agent i is as follows, Recalling stochasticity of W matrix, we have Ax k−1 = j∈N β (i) W ij Ax k−1 . Substituting this along with equations (1)- (2), with η η η i k ν ν ν k−1 − K i j∈Nα(i) (c j ζ j k + c j τ j k + c j c j ν ν ν k−1 ). Using the definition of Kronecker and entrywise products, the collective error and noise term follow Eq. (5)-(6).

Error Stability
The following lemma establishes the stability condition of the error dynamics (5)-(6). Lemma 1. The necessary condition for error dynamics (5)-(6) to be stable is that the pair (W ⊗ A, D C ) is observable.
Proof. The proof follows the Kalman stability theorem on the error dynamics (6). More information can be found in [86], [90], [91] on error stability of linear observer design.
Note that (W ⊗ A, D C )-observability is also referred to as the distributed observability [68]. Using structured system theory (generic analysis), distributed observability can be formulated as the observability of the Kronecker product of the graphs G A and G β . Following the observability anal-ysis of Kronecker composite networks in [92], the following lemma determines the sufficient connectivity of G β and G α .

Lemma 2. The pair (W ⊗ A, D C ) is observable if and only if the following conditions hold:
1) G β is strongly-connected (SC) with self-link at each agent, which further implies that W is irreducible. 2) G α is a hub-network in which every α-agent is a hub, i.e., there is a directed link from every α-agent to every other agent in G α . Further, i ∈ N α (i) for every agent i.
Proof. We provide the sketch of the proof here and refer the interested reader to [92] for more details. For (structural) observability two conditions on the associated composite graph need to be satisfied [86], [88]: (i) the output connectivity condition, implying the existence of a directed path from every state node in the system graph G A to an agent (output), and (ii) the rank condition, implying a direct output of (at least) one state node in every contraction in G A for system-output rank recovery. In this work, the global system graph associated with W ⊗ A is the Kronecker-product of G A and G β . Recall that for (W ⊗ A, D C )-observability (or distributed observability) the global system state must be observable to every agent. Therefore, to satisfy condition (i), every state node needs to be connected via a directed path to every agent, which justifies strong-connectivity of G β . On the other hand, to satisfy condition (ii), the outputs from state nodes measured by all α-agents (including one node in every contraction) need to be directly shared among all agents to recover their system-output rank. This implies that for any α-agent j, we have j ∈ N α (i), ∀i ∈ {1, . . . , N }. This justifies the connectivity of G α , and completes the proof.
With G β and G α satisfying the conditions in Lemma 2, the block-diagonal gain matrix K can be designed such that ρ( A) < 1, i.e., A is a Schur matrix. In fact, the gain matrix K is known to be the solution to the Linear-Matrix-Inequality (LMI) X − A X A 0 or equivalently, for some X 0 (where " " denotes positive-definiteness). However, to satisfy the distributed condition, K needs to be further block-diagonal in order to satisfy information locality. Following [91], [93], iterative cone-complementarity optimization method is adopted to design the proper K matrix with polynomial-order complexity. Applying such K matrix, we have ρ( A) < 1, which implies stability and steady-state boundedness of the error in the attack-free case.
Lemma 4. Define Q k := E(e k e k ) and Φ := E(η η η k η η η k ). Let Q ∞ = lim k→∞ Q k denote the collective error covariance at the steady-state. For error dynamics (5) in the attack-free case, with Proof. Following [87] with A 2 b, From (6) we have, Then, from (6), Using the fact that 1 N N ⊗ E 2 = N E 2 , and applying equation (11) results in (10).
In fact, Lemma 3 implies that the estimator (3)-(4) is unbiased in the absence of attacks, while Lemma 4 states that its mean-square estimation error (also known as meansquare deviation [27]) is bounded in steady-state.

MAIN ALGORITHM
We now describe the attack detection logic. Define the residual at every agent i as the absolute difference value between the original output y i k and the estimated output, Note that the residual defined above based on the absolutevalue is a standard definition, which is irrespective of the attack being positive (τ i k > 0) or negative (τ i k < 0) and works for both sign-preserving and sign-changing attacks. As shown in Lemmas 3 and 4, in the attack-free case with τ i k = 0, the estimation error e i k , and therefore, the residual r i k is bounded steady-state stable and unbiased at all agents. Note that in general A i e k−1 → 0 due to Schur stability of A, while the second term in (14) is, In case of an attack on agent i, i.e., τ i k = 0, the term c i η η η i k is biased at agent i. This biased residual can be used to find (isolate) the attacked agent. In this sense, first, we need to define a threshold on the residuals to distinguish the effect of noise terms (in absence of attacks) and the biasing attacks.

Probabilistic Threshold Design
Here, the probabilistic detection thresholds are defined based on Q ∞ in (10). For each agent define, Then, for specific false alarm rates and attack detection probabilities κ, one can consider different detection-levels m ∈ R >0 as described in Fig. 3. A detection-level m represents a specific probability threshold κ associated with the Gaussian PDF of the estimation error in the attack-free case. Then, the thresholds θ κ are designed as follows.

Lemma 5.
Following the assumptions in Section 2.4, given the noise covariance R and E and the residuals r i k from Eq. (14), the attack detection threshold for a detection-level m ∈ R >0 is, is detection probability (with erf(·) as the Gauss error function), c i is the measurement column-vector at agent i, and Θ 1 follows (16).
Proof. The proof directly follows from Lemma 3 and 4 and the results in [87]. From Lemma 3 and 4, E(e i k ) = 0 for attack-free case, and following the zero-mean Gaussian distribution of the noise terms in η η η k (including ν ν ν k and ζ ζ ζ k ) and linearity of the error dynamics (5)-(6) and the protocol (3)-(4), it is straightforward to see that e i k and r i k are Gaussian; see details in [87]. Then, from standard textbooks on Gaussian distribution (e.g., [94]) and Eq. (14) in attack-free case, the probability of |r i k | ≤ mΘ i 2 with Θ i 2 = |c i |Θ 1 + R ii is determined via the value of the normal deviate less than mΘ i 2 , i.e., κ = erf( m √ 2 ). Recall that Θ i 2 is the residual variance and R ii is the measurement noise variance at agent i. Then, in presence of attack, both error e i k and residual r i k are biased by some products of τ i k = 0 (due to linearity). In this case, the residual follows a biased Gaussian distribution with non-zero mean. Following statistical hypothesis testing for the two Gaussian distributions with equal variance (assuming equally likely a-priori hypothesis), if the residual |r i k | is greater than mΘ i 2 then the probability of attack is κ and probability of false alarm is 1 − κ. This justifies the probability thresholds θ κ mΘ i 2 (as illustrated in Fig. 3) and completes the proof.
The parameter m in (17) and Lemma 5 can take any real (or integer) value in R >0 . Some typical threshold probability values κ for integer values of m are given in Table 2. Clearly, higher values of m (and κ) implies lower false alarm rates. .7 % 9 9 .9 9 % 9 9 .9 9 9 9 % 1 Fig. 3. This figure illustrates the attack detection logic in Lemma 5. The confidence intervals for the normalized residual in the absence of attack (blue curve) are shown. Each value of m in Eq. (17) associated with a confidence interval represents a probability threshold κ associated with the Gaussian PDF of the residual. As an example, the red and green lines represent two normalized residual values r k Θ 2 via Eq. (14) and (17). Following the binary hypothesis testing (maximum-likelihood case), the threshold on the residual is the intersection (midpoint) of the two PDFs, where the residual belongs to the PDF in the presence of attack (red curve). Since the residual is over the threshold θκ with m = 2 (r k > 2Θ 2 ), probability of attack is more than κ = 95.4%. This probability is equal to the red shaded area (since both PDFs follow the same normal distribution), while the probability of false alarm (1 − κ = 4.6%) is shaded by blue. Clearly, this gives the highest probability of detection, while for higher threshold values (larger κ) the residual is not detected as biased/attacked. For the green residual with |r k | < Θ 2 , the residual is most likely due to (system/measurement) noise, which is also evident from the (blue) PDF. Recall that, in general, m may take (positive) real values over infinite range (m → ∞).

Remark 5.
A straightforward sequel to Lemma 5 is that one can design the threshold θ κ for a given false-alarm rate κ = 1 − κ as The magnitude of the residual r i k is tightly related to the magnitude of the biasing attack τ i k . In other words, greater measurement bias τ i k results in greater residual r i k exceeding the threshold θ κ with higher attack probability κ and lower probability of false alarm κ = 1 − κ.
Recall from Remark 1 that, unlike [55]- [59] considering a fixed (deterministic) threshold based on the upper bound on ζ ζ ζ k , Eq. (17) assigns probability κ to the threshold θ κ with no such upper bound assumption on the noise terms, implying the probabilistic threshold design.

Attack Detection and Mitigation Logic
Recall that, following Lemma 2, the connectivity of the α, β, and γ-agents over G α and G β results in the next lemma. Lemma 6. Following the connectivity condition in Lemma 2 and residual formulations in (14)-(15), (i) In case of having no α-agent 2 , attack at any β or γ-agent is isolated. (ii) For isolation of attack in presence of an α-agent j, the gain matrix K needs to satisfy, where 0 ≤ < 1 is a pre-specified constant determining the residual ratio.
2. Number of α-agents is equal to the rank-deficiency of the system matrix A [85]. Therefore, for a full-rank system the associated distributed estimator has no α-agent [71].
Proof. From Lemma 2, in absence of any α-agent, N α (i) = {i} for any agent i of type β and γ. Thus, from (14)- (15), biasing attack τ i k = 0 at a β or γ-agent i only affects the residual r i k . This implies that r i k is biased while r j k (j = i) is unbiased, implying that attack τ i k is isolated at any β/γagent. On the other hand, in the presence of an α-agent j subject to attack τ j k = 0, Eq. (14)- (15) implies that the residual r i k at every agent i is affected by the attack at agent j ∈ N α (i) via the term c i K i c j , while the residual r j k at αagent j is affected by the factor c j K j c j − 1. Therefore, Eq.
, implying greater residual at α-agent j by factor 1 . This constraint ensures that the attack can be isolated at every α-agent j.
Following Lemma 5 and 6, for the attacked agent i (of any type) the residual r i k is (more) biased over θ κ in (17), while the residuals at other agents are less biased (or unbiased). Largest κ such that r i k ≥ θ κ declares the probability of attack (or probability of false alarm 1 − κ). Likewise, from Remark 5 and 6, the attack detection logic can be designed for a given false alarm rate κ i (and probabilistic threshold θ κi ) at sensor i. Then, similar to the deterministic case, the following hypothesis testing locally declares "Attack" or "No-Attack" at sensor i (under certain false alarm rate κ i ), Remark 7. A relevant concept is nodal/local consistency of measurement/prediction information (data) set at agent i and j ∈ N α (i) ∪ N β (i) at every time k, denoted by I i k , I j k [95]. Recall that nodal consistency checks the statistical consistency of I i k with the information I i [k−T,k] over a sliding time-window T , declaring that I i k is trustable or not. In this direction, one can track the information over such time-window T and apply, for example, a chi-square detector on the residuals over T [16] instead of instantaneous residuals (14). Local consistency, on the other hand, checks the statistical consistency of the common information (e.g., on the shared observable subspace) between I i [k−T,k] and received information I j [k−T,k] , j ∈ N α (i)∪N β (i), and declares if I j k is trustable or not. Note that for (necessary) α/β-agents, weak local consistencies imply certain loss of observability information and degradation of estimation performance.

Remark 8. (Attack mitigation)
From Section 2.2, α/β-agents are necessary for observability; therefore, in case of attacks, their erroneous information of their observable subsystems makes those subsystems unobservable to all agents, causing unstable estimation error. To recover the loss of observability, recall that the states in the same parent SCC S p l and in the same contraction C l are observationally-equivalent, in the sense that measurement of two states in S p l or in C l provide information on the same observable subsystem. In other words, the information I i k , I j k offered by two state measurements (agents i, j) are said to be observationallyequivalent if they equally contribute to the rank recovery of the observability Gramian (see detailed definition in [42], [84]). In this regard, for attack mitigation, the biased measurement can be replaced with a new measurement of an observationally-equivalent state in S p l or C l . Note that, after mitigating the attacks, the performance analysis follows as in Section 3.2. (Cost-optimal mitigation) Given an observationally-equivalent set of state nodes S p l or C l , the substitute/replacement state measurement can be chosen based on its sensing cost. Combinatorial optimization strategies [96], e.g., the well-known Hungarian algorithm, can be adopted to find the minimal-cost equivalent measurement to reduce the overall sensing cost. Similar arguments hold for cost-optimal design of the multi-agent network G N = G α ∪ G β , e.g., using the so-called minimum spanning strong sub-graph algorithm [97].
Remark 8 along with Lemma 5 and 6 result in Algorithm 1. Note that the terms D C in (5) and R in (16) are defined locally, i.e., the i-th diagonal block of D C and R related to agent i are defined based on received measurement information c j and R j from its direct neighbors (summation is over j ∈ N α (i)). Therefore, the calculations of these terms are distributed and localized over the network. The thresholds θ κ in (17), agent types, and the sets of observationallyequivalent states in the system digraph G A are determined by a central entity once off-line, then, broadcasted and transmitted to every agent. This procedure is done once and the information is stored at all agents; then, the agents can perform estimation and detect the attack locally with no further role of the centralized entity. See similar assumptions in [87], [91] for distributed estimation/filtering.
Comparison with recent literature: next, we use the estimation and detection strategy in [53], [54] for comparison. Recall that from Remark 4, the distributed observer 3. The auto-regressive attack is given as an example of possible extension of the results to the case of non-stationary attacks, where the attack probabilities can be approximated by Lemma 5.

TABLE 3
Parameter values for the detection and estimation protocol in [53], [54].  [53], [54]. The attack is detected via the threshold Φ. Clearly, the protocol in [53], [54] with parameters given in Table 3 detects both attacks at agents α 2 and β, while also raising false alarm on agent α 1 at some times. In contrast, our proposed detection strategy only raise alarm on the attacked agents as shown in Fig. 4-(Right). (Right) Using Algorithm 1, the detected attacks are mitigated by adding equivalent agents α 2 and β to recover the loss of observability.
The new agents α 2 and β measure observationally-equivalent states, respectively, in the same contraction (green nodes) and in the same parent SCC (purple nodes).
in [53], [54] is a double time-scale protocol, which requires many iterations of consensus between every two timesteps of system dynamics. Therefore, it needs much faster information sharing/processing rate as compared to the proposed protocol (3)-(4). The reason for choosing [53], [54] for comparison study is that double time-scale protocols make similar relaxed observability assumption as Assumption (ii) in Section 2.4 (irrespective of system rank-deficiency). This is in contrast to many exisitng single time-scale protocols, e.g., [26], [27], [31]- [35], which assume that the underlying system is observable in the neighborhood of each agent and/or is full-rank. In other words, the mentioned references generally require more network connectivity, and therefore, do not result in steady-state stable error over the given G α and G β networks in Fig. 4-(Left). We set the parameters in [53], [54] as in Table 3 (which seem to provide the best outcome). In this simulation, agents need to perform L = 40 consensus iterations for estimation/detection, which requires 40-times faster communication and computation rate as compared to the proposed protocol (3)-(4). The results are shown in Fig. 5-(Left). Following the attack detection logic in [53], [54], the agents can detect possible attacks if their measurement-updates are over a certain threshold Φ. From Fig. 5-(Left), both attacks are detected, while also falsely alarming attack at agent α 1 at some times. Attack mitigation and performance analysis: next, using the mitigation strategy in Algorithm 1, we replace the detected attacked agents β and α 2 with substitute agents β and α 2 , respectively measuring observationally-equivalent state 3 in S p 1 and state 8 in C 2 . The connectivity of the new agents follows the same connectivity of G α and G β as shown in Fig. 5-(Right). We perform Monte-Carlo simulation (averaged over 100 repetitions) of the proposed protocol (3)-(4) for the attack-mitigated case of Fig. 5-(Right). The meansquare performance and mean performance are shown in

CONCLUSION
This paper considers a decentralized attack detection over distributed estimation networks. The detection, isolation, and mitigation strategy is designed specifically for α, β, and γ-agents in polynomial-order complexity. As future research direction, network reconfiguration [11], [99] to reduce attack vulnerability and design of attack-tolerant/resilient engineered networks is promising. Further, one can track the history of residuals (for general rank-deficient systems) over a sliding time-window (known as stateful detection [36]), similar to χ 2 -detection [16] or trust-index evolution [95].