Quickest Detection of Dynamic Events in Networks

The problem of quickest detection of dynamic events in networks is studied. At some unknown time, an event occurs, and a number of nodes in the network are affected by the event, in that they undergo a change in the statistics of their observations. It is assumed that the event is dynamic, in that it can propagate along the edges in the network, and affect more and more nodes with time. The event propagation dynamics is assumed to be unknown. The goal is to design a sequential algorithm that can detect a"significant"event, i.e., when the event has affected no fewer than $\eta$ nodes, as quickly as possible, while controlling the false alarm rate. Fully connected networks are studied first, and the results are then extended to arbitrarily connected networks. The designed algorithms are shown to be adaptive to the unknown propagation dynamics, and their first-order asymptotic optimality is demonstrated as the false alarm rate goes to zero. The algorithms can be implemented with linear computational complexity in the network size at each time step, which is critical for online implementation. Numerical simulations are provided to validate the theoretical results.


Introduction
In the problem of quickest change detection (QCD), a stochastic system is observed sequentially. At some unknown time, a change occurs that changes the data generating process. Observations are taken sequentially with time, and the objective is to detect the change as quickly as possible subject to false alarm constraints (see [2][3][4][5] for an overview). The QCD framework models a wide range of applications, e.g., fraud detection, intrusion detection, environmental monitoring, line outage detection in power systems, quality control in online manufacturing systems and spectrum monitoring in wireless communications. However, in many applications, e.g., epidemic detection [6,7], opinion mining in social networks [8], anomalous event detection in sensor networks (e.g., internet of battlefield things) [9], detection of malicious code spreading in computer networks [10], the data are usually collected from networks with certain underlying topologies. Following the occurrence of an event, it propagates dynamically across the network, affects more and more nodes, and changes their data generating behaviors with time (see Fig. 1). The propagation dynamics is usually unknown in practice, and depends on the underlying network topology.
Motivated by these applications, we study the problem of quickest detection of dynamic events in networks. Suppose a network is monitored in real time by a set of L nodes that communicate with a fusion center. At some unknown time, an event occurs in the network that causes eventual changes in the observations of a connected subset of nodes. The event occurs at a connected subset of nodes and then dynamically propagates along the edges in the network, and the affected nodes form a connected sub-graph, the size of which grows with time. The propagation dynamics are assumed to be unknown, i.e., the set of nodes and the order in which they are affected are unknown. We are interested in detecting a "significant" event, i.e., one that affects η ≥ 1 nodes as quickly as possible, subject to false alarm constraints.

Related Works
The problem in this paper is closely related to the problem of QCD under the multi-channel setup, in which one or multiple unknown nodes perceive a change simultaneously [11][12][13][14][15], or alternatively, at different times [16][17][18]. The major differences from these previous works lie in that: (i) we are interested in detecting whether the event has affected at least η nodes, i.e., the event is "significant" enough, whereas previous works focus on the special case with η = 1, i.e., whether the event has occurred or not; (ii) instead of considering the worst-case performance over all possible times that the nodes are affected [18] or taking a Bayesian approach [16,17], we assume that the times that the nodes are affected are deterministic and unknown, and we are interested in designing algorithms that adapt to unknown propagation dynamics; and (iii) we consider structured networks, over which events can only propagate along network edges.
On a temporal scale, the data generating distribution of the whole network dynamically changes over time. As the event affects more nodes with time, the network goes through multiple transient phases in which the sets of affected nodes are different. This is related to the problem of QCD under transient dynamics [19][20][21], where after an event occurs, the pre-change distribution does not change to a persistent post-change distribution instantaneously, but only after a number of transient phases. Each transient phase is associated with a distinct data generating distribution. In [19][20][21], it is assumed that the number of transient phases and the data generating distributions associated with each phase are known. In this paper, event propagation dynamics are unknown. Therefore, the results in [19][20][21] cannot be directly applied to solve the problem here. Moreover, in [20] a Bayesian approach is employed, where it is assumed that the transient durations are geometrically distributed with statistics known to the decision maker. In contrast to [20], in this paper, we make no probabilistic assumptions on the times at which the nodes are affected.
The data generating distributions for the whole network before and after η nodes are affected are both composite, i.e., belong to a set of distributions, as they are determined by the unknown subset of affected nodes with unknown change times. Therefore, the problem in this paper is related to the problem of QCD with composite pre-/post-change distributions [22][23][24][25]. However, our problem differs from these works in the following ways. First, the data generating distribution before η nodes are affected is composite, whereas in [22,23,25], only the post-change distribution is composite. Second, our test statistics can be computed efficiently at each time step with a computational complexity linear in the network size (number of nodes), whereas a sliding window approach is usually used to control the computational cost in [22][23][24][25]. Third, in our problem, the distribution of samples before and after the nodes are affected by the event are arbitrary (not necessarily belonging to an exponential family), and the parameters of the data generating distribution for the whole network (times at which the nodes are affected by the event) are discrete, and do not necessarily belong to a compact parameter space.
The offline setting of our problem has been extensively studied in the literature [26][27][28][29][30][31][32][33][34]. The aim of these works is to detect whether there exists a subset of nodes in the network, which have certain geometric structures (e.g., connected subgraphs), and the observations received by this subset of nodes are generated from a distribution different from the one that generates the samples for the rest of the nodes in the network. Our problem is the online dynamic version with a growing anomalous geometric structure. Hence, a small computational complexity at each time step is important in order for making timely decisions. However, many previous works are based on the scan statistic that scans over all connected subgraphs, which is computationally inefficient for large networks, and thus cannot be directly applied to our online setting. As will be shown later, our algorithms can be updated recursively with computational complexity linear in the network size at each time step, and we do not reprocess the previous data over and over again.

Contributions
In this paper, we start with fully connected networks, and then extend to arbitrarily connected networks.
For fully connected networks, the event can propagate from any node to any other node. Then, the algorithm design does not need to account for the fact that the event only propagates along the edges in the network, and the network structure does not matter. This simplifies the problem to one where we are simply interested in detecting when an arbitrary subset of η nodes has been affected by the event.
For fully connected networks, we solve the QCD problem by reformulating it as a dynamic composite hypothesis testing problem, where we distinguish between two hypotheses at each time instant. The null hypothesis corresponds to the case that less than η nodes are affected; and the alternative hypothesis corresponds to the case that at least η nodes are affected. The data generating distributions for the whole network before and after η nodes are affected are both composite, as they are determined by the unknown subset of affected nodes with unknown change times. We take the generalized log-likelihood ratio between the two composite hypotheses as the detection statistic, and compare it to a positive threshold. If it is greater than the threshold, then we stop and raise an alarm; otherwise, we take another sample from each node in the network.
We show that the generalized log-likelihood ratio test is equivalent to one that compares the sum of the smallest L − η + 1 local Cumulative Sum (CuSum) statistics [35] to the same threshold. The resulting algorithm, which we refer to as Spartan-CuSum (S-CuSum), is computationally efficient with O(L) complexity at each time step. This guarantees that the algorithm can be implemented efficiently in an online fashion for large networks. We further show that the S-CuSum algorithm satisfies the false alarm constraints with a properly chosen threshold for all scenarios with fewer than η affected nodes, and adapts to unknown event propagation dynamics. We then establish the asymptotic optimality of the S-CuSum algorithm up to a first-order approximation as the false alarm rate goes to zero.
For arbitrarily connected networks, the S-CuSum is still applicable. However, it does not account for the fact that the event only propagates along the edges in the network, and thus will trigger more false alarms. A direct generalization of the generalized log-likelihood ratio test involves a complicated statistic that scans over all connected sub-graphs and propagation dynamics at each time step. This is computationally intractable for a large arbitrarily connected network, especially under the online setting. We then construct an algorithm based on a thresholding approach, the breadth-first search (BFS) algorithm and the S-CuSum algorithm, which we refer to as the Network-CuSum (N-CuSum) algorithm. We show that the computational complexity of the N-CuSum algorithm is linear in the network size at each time step, and that it is asymptotically optimal up to a first-order approximation as the false alarm rate goes to zero. Moreover, we show through our numerical results that the N-CuSum algorithm, which accounts for the network structure, has a better performance than the S-CuSum algorithm. Both the S-CuSum algorithm and the N-CuSum algorithm are better than the generalized multi-chart CuSum algorithm [36], which stops when at least η local CuSum statistics cross their individual thresholds, and the network generalized multi-chart CuSum algorithm, which stops when the local CuSum statistics of at least η connected nodes cross their individual thresholds simultaneously.

Paper Organization
This paper is organized as follows. In Section 2, we formulate the problem mathematically. In Section 3, we focus on fully connected networks, present the S-CuSum algorithm, and present a method to choose the threshold to satisfy the false alarm constraints. In Section 4, we demonstrate the asymptotic optimality of the S-CuSum algorithm. In Section 5, we study arbitrarily connected networks, present the N-CuSum algorithm and demonstrate its asymptotic optimality. In Section 6, we present numerical results. Finally, in Section 7, we discuss some potential extensions.

Problem Formulation
Consider a network monitored in real time by a set of L nodes. We use an unweighted, undirected graph G = (V, E) to denote the underlying structure of the network. Here, L = |V |. In practice, an edge connecting two nodes may be due to the fact that two nodes communicate with each other, or are geometrically close to each other.
Before an event occurs, node i ∈ {1, 2, . . . , L} receives independent and identically distributed (i.i.d.) samples from distribution f 0 . If an event occurs, and node i is affected by the event at an unknown time ν i , then it starts to receive i.i.d. samples from distribution f 1 , i.e., ν i is the change-point at node i. If ν i = ∞, node i will not be affected by the event ever. More specifically, if we denote the observation received by node i at time k by X i [k], then We assume that the event first affects a connected subset of nodes, which might be due to the locality of the event, and then dynamically propagates along the edges in the network (see Fig. 1 for an example). Equivalently, at every time step, the induced sub-graph on all the affected nodes is connected.
We consider a centralized setting in which a fusion center obtains the samples of all the nodes without delay. We are interested in sequentially detecting a "significant" event, i.e., one that affects at least η ≥ 1 affected nodes. If an alarm is triggered at a time when fewer than η nodes are affected, it is then considered as a false alarm event.
Let ν = {ν 1 , . . . , ν L }, which is unknown in advance. Without loss of generality, we assume that ν 1 ≤ ν 2 ≤ · · · ≤ ν L , with the ordering being unknown to the decision maker in advance. We note that ν i can be equal to ν i+1 , i.e., one node can affect more than one of its neighbors simultaneously. Then ν η is the first time when at least η nodes are affected by the event. Thus, our problem is to detect the change at ν η as quickly as possible subject to false alarm constraints.
For any ν, denote by C(ν) = {i : ν i < ∞} the set of all the indices of the nodes that will eventually be affected by the event. Then is the total number of affected nodes. If |C(ν)| = L, then all the nodes will be affected by the event eventually.
We use P ν to denote the probability measure of the samples with the set of change-points being ν, and let E ν denote the corresponding expectation. For a given ν, if |C(ν)| < η, i.e., ν η = ∞, then there are fewer than η nodes that will be affected under P ν . In this case, if an alarm is triggered, it is a false alarm. For any stopping time τ , to measure how frequently false alarms occur, we define the worst-case average run length (WARL) to false alarm as follows: Then with a larger WARL, we have fewer false alarms.
Let d i = ν i+1 − ν i denote the time it takes for the event to propagate from node i to node i + 1, for 1 ≤ i ≤ L − 1. If d i = 0, then node i and node i + 1 are affected simultaneously. Denote D := {d η , d η+1 , . . . , d L−1 }. For a fixed D, to measure how quickly we can detect when at least η nodes are affected, we define the worst-case average detection delay (WADD) using a criterion based on Pollak's criterion [37] as follows: We note that the supremum in (4) is only taken over We denote by F k the σ-algebra generated by the observations of all the nodes up to time k, for k = 1, 2, . . .. We wish to find a {F k } k∈N -stopping time that achieves "small" detection delay, while controlling the false alarm rate. More specifically, for any D, the goal is to minimize J D [τ ] subject to a constraint on the WARL: To describe the objective in words, we want to find stopping rules so that for all possible scenarios with fewer than η affected nodes, the average run length to false alarm is at least γ. At the same time, among those stopping rules that satisfy the false alarm requirement, we want to find the one that minimizes the WADD for all propagation dynamics after η nodes are affected. There is no guarantee that the optimization problem in (5) has a solution, since we require the same stopping rule to simultaneously minimize the WADD for all propagation dynamics after η nodes are affected. What we will show in the following sections is that such a "uniformly" optimum solution can be found up to a first-order approximation in an appropriately defined asymptotic setting.

Notation
We denote the samples across all the nodes at time k by X[k] = {X 1 [k], . . . , X L [k]}, and the samples across all the nodes from time which is the log-likelihood ratio for the samples at node i from time k 1 to k 2 . We use the following conventions: k 2 j=k 1 A j = 0 and k 2 j=k 1 A j = 1 if k 1 > k 2 . We use X + to denote the positive part of X, i.e., X + = max{X, 0}. We denote the Kullback-Leibler (KL) divergence between f 1 and f 0 as which is assumed to be positive and finite. We denote

Fully Connected Networks
In this section, we study fully connected networks, for which G is a complete graph. In this case, the event can propagate from any node to any other node, and the induced sub-graph on any subset of nodes is connected. We present the Spartan-CuSum (S-CuSum) algorithm, and show that it can be implemented efficiently with complexity that is linear in L at each time step. We then establish a lower bound on the WARL for the S-CuSum algorithm, and show how to choose the parameter to satisfy the false alarm constraint.

The S-CuSum Algorithm
We reformulate the quickest detection problem in Section 2 when G is a complete graph as a dynamic composite hypothesis testing problem, i.e., to distinguish the following two hypotheses at each time k: Both the null and alternative hypotheses are composite, since the data generating distribution depends on unknown ν under each hypothesis. This hypothesis testing procedure stops once a decision in favor of the alternative hypothesis is reached; otherwise, a new sample is taken from each node in the network.
To distinguish between the two hypotheses, we consider the log-likelihood ratio between them. Since ν under each hypothesis is unknown, we take a maximum likelihood approach with respect to the unknown ν, and construct the following generalized log-likelihood ratio statistic: The max in the numerator in (10) is taken over all ν such that L i=1 1 {ν i ≤k} ≥ η, where at time k, there are no fewer than η affected nodes. Likewise, the max in the denominator in (10) is taken over all ν such that L i=1 1 {ν i ≤k} < η, where at time k, there are fewer than η affected nodes. Since the network is fully connected, the induced sub-graph on any subset of nodes is connected. Therefore, the max in both the numerator and denominator in (10) is taken without explicitly enforcing connectivity and propagation dynamics.
The corresponding stopping time is then given by comparing W [k] against a pre-determined positive threshold:τ where b will be selected according to the false alarm constraint.

A Simpler but Equivalent Form
In this subsection, we develop an equivalent but much simpler form of (11), which can be computed with complexity O(L) at each time step.
Let P ∞ denote the probability measure with ν i = ∞, ∀1 ≤ i ≤ L. Then, it can be easily shown that Due to the fact that the first term in (12) is equivalent to Similarly, the second term in (12) is equivalent to If we denote the individual CuSum statistic [35] at node i (testing a change from f 0 to f 1 ) at time k as and define a permutation µ(·) such that which we refer to as the S-CuSum algorithm. Such an equivalence can be established as follows. (11) is The test in (18) can be implemented efficiently. First of all, for each node i, W i [k] can be updated recursively: Second, we do not need to sort all W i [k] at each time k. We only need to find the smallest L − η + 1 numbers from L numbers, which can be solved with O(L) computational cost using the algorithm in [38] instead of O(L log L). Thus, the total computational cost at each time k is O(L). Here, we note that, at time k, each node i may choose to send to the fusion center.

Lower Bound on the WARL
The following theorem provides a lower bound on the WARL for the S-CuSum algorithm. Theorem 1. The WARL for the S-CuSum algorithm in (18) is lower bounded as follows: where poly(b) denotes a polynomial of b.
Proof. To show the lower bound on WARL(τ (b)), it suffices to show that for any ν with For any t ∈ N and b > 0, it follows that By [13,Lemma B1], it then follows that where poly(b) has an order of L − η. Therefore, and b ∼ log γ, as γ → ∞.
Proof. The result follows from Theorem 1.

Asymptotic Analysis
In this section we study the asymptotic performance of the proposed S-CuSum algorithm in (18) and demonstrate its asymptotic optimality. For our asymptotic analysis to be non-trivial, we let not only the prescribed lower bound on the WARL, γ, go to infinity, but also d η , d η+1 , . . . , d L−1 . Indeed, if the latter variables are fixed as γ goes to infinity, then we will not be able to characterize how the propagation dynamics affects the performance, and the asymptotic performance will only depend on the number of nodes that will be affected eventually. Therefore, in order to perform a general and relevant asymptotic analysis, we let d η , d η+1 , . . . , d L−1 go to infinity with γ. Without loss of generality, suppose that In the following, we first present an example with L = 3 and η = 2 to understand the results, and then move on to the general results.

Example: L = 3 and η = 2
Consider a fully connected network with three nodes, i.e., L = 3. Our goal is to detect when at least two nodes are affected, i.e., η = 2. Then the S-CuSum algorithm is equivalent to comparing the sum of the smallest two individual CuSum statistics to a threshold b: For simplicity, we use the notion of phase i to denote the phase in which there are i affected nodes. Then after an event occurs, the network first changes from phase 0 to phase 1, then to phase 2, and eventually stabilizes in phase 3 (see Fig. 2). For this example, the (first-order) asymptotic optimality of the S-CuSum algorithm is characterized in the following proposition. Proposition 1. Let the threshold b ∼ log γ so that WARL(τ (b)) ≥ γ. Assume that d 2 and γ go to infinity as in (27), then the S-CuSum is asymptotically optimal: Proof. The proof for this special case is omitted. See detailed proof for the general case in Theorem 4.
The optimal performance shows a dichotomy depending on whether c 1 > 1 or not. In the following, we will first provide a heuristic explanation for the dichotomy, and then provide the results for the general case together with rigorous proofs.
Roughly speaking, if we consider the CuSum statistic at node i, before the change-point ν i , it is small and close to 0, and after the change-point, it grows with a positive slope of I. Therefore, by (28), the S-CuSum is close to 0 in phases 0 and 1, and grows with slope I in phase 2 and with slope 2I in phase 3.
If d 2 I > b, i.e., c 1 > 1, then the S-CuSum statistic crosses the threshold b within phase 2 (see Fig. 3). The detection delay for this case is b/I. If d 2 I ≤ b, i.e., c 1 ≤ 1, then the S-CuSum statistic is not large enough to cross the threshold b within phase 2, and it needs more samples from phase 3 (see Fig. 4). The detection delay is then equal to the sum of the duration of phase 2 and the number of samples needed from phase 3: Depending on whether or not phase 2 is long enough, the performance of the S-CuSum algorithm shows a dichotomy.

Asymptotic Universal Lower Bound on the WADD
In this subsection, we study the universal lower bound on the WADD for any stopping rule with the WARL no smaller than γ. We denote Proof. See Appendix B.
Theorem 2 suggests that to meet the asymptotic universal lower bound, an algorithm should be adaptive to the unknown d η , d η+1 , . . . , d L−1 . An intuitive understanding of h is that the algorithm shall stop within phase h + η, when there are h + η affected nodes.
The proof is based on a change-of-measure argument and a Law of Large Numbers argument for the log-likelihood ratio statistics, similar to those in [22]. However, a major difference in the change-of-measure argument compared to [22] is that the "pre-change" mode is composite, i.e., there are multiple possible scenarios with fewer than η affected nodes. Furthermore, the postchange statistic is more complicated, since the propagation dynamics is unknown, and the number of affected nodes is changing with time.

Asymptotic Upper Bound on the WADD
Recall that we can choose b ∼ log γ such that the false alarm constraint is satisfied. Then, by (27), it follows that as γ → ∞, where c i ∈ [0, ∞], for every i = 1, . . . , L − η.
An asymptotic upper bound on the WADD for the S-CuSum algorithm in (18) is characterized in the following theorem.
Proof. See Appendix C.
From Theorem 3, it is clear that although the S-CuSum algorithm does not exploit the knowledge of d η , d η+1 , . . . , d L−1 , the performance is still adaptive to the unknown d η , d η+1 , . . . , d L−1 . This is consistent with the insights from the asymptotic universal lower bound in Theorem 2. The proof of the asymptotic upper bound on WADD is based on partitioning the samples into independent blocks and applying the Law of Large Numbers for the log-likelihood ratio statistics, as in [22,Theorem 4]. The major difficulty here is due to the more complicated test statistic, in which the number of affected nodes changes with time.

Asymptotic Optimality of S-CuSum
We are now ready to establish the asymptotic optimality of the S-CuSum algorithm, which is presented in the following theorem.
Theorem 4 (S-CuSum, Asymptotic Optimality). Let the threshold b ∼ log γ so that WARL(τ (b)) ≥ γ. Assume that d η , d η+1 , . . . , d L−1 and γ go to infinity as in (27), then the S-CuSum algorithm is asymptotically optimal: In this section, we extend results of the previous section to arbitrarily connected networks.
Since we assume that the event propagates along edges in the network, the induced sub-graph on the affected nodes is then connected at each time step. Here, for an arbitrarily connected network, the induced sub-graph on some subset of nodes may not be connected. Therefore, although the S-CuSum algorithm still applies, and it is asymptotically optimal (up to a first-order approximation), it will have more false alarms due to the fact that it may raise alarms when it detects η nodes that are not connected (as we will show numerically in Section 6).
To exploit knowledge of the network structure, one can directly adapt the generalized loglikelihood ratio test in (10). However, this will involve a scan statistic over all possible propagation dynamics along the edges in the network, which leads to a combinatorial problem over a large search space. Therefore, it is computationally infeasible, especially for large networks.
In the following, we present the Network-CuSum (N-CuSum) algorithm, which not only employs knowledge of the network structure, but does so in a computationally efficient way. We then establish the first-order asymptotic optimality of the N-CuSum algorithm. Also, as will be shown in the numerical results in Section 6, for an arbitrarily connected network, the N-CuSum algorithm performs much better than the S-CuSum.

The N-CuSum Algorithm
At each time step k, we update the local CuSum statistics W i [k], ∀1 ≤ i ≤ L. We then compare each local CuSum statistic to a threshold log b, and delete this node if its CuSum statistic is less than log b. The resulting graph is then denoted by G [k]. For this step, the computational complexity is O(L). These deleted nodes are highly likely to be not affected by the event.
We run the BFS algorithm on G [k] to recover all connected components of G [k]: C 1 [k], C 2 [k], .... The computational complexity for this step is at most O(L + |E|). We then run the S-CuSum algorithm on each connected component, and use S-CuSum i [k] to denote the test statistic value of the S-CuSum algorithm on C i [k]: If any of these statistics crosses the threshold b, we stop and raise an alarm. For this step, the computational complexity is less than O(L). Therefore, the overall computational complexity at each time step is O(L + |E|), which means that this algorithm scales well as the network size grows, assuming that the network is not dense. The N-CuSum algorithm is described in detail in Algorithm 1.

Performance Analysis of the N-CuSum Algorithm
We next provide theoretical analysis for the N-CuSum algorithm. The following theorem provides a lower bound on the WARL for the N-CuSum algorithm.

Algorithm 1 N-CuSum
Input: G : graph η: size of sub-graph of interest f 0 , f 1 : distributions before and after change b: threshold Output: τ : stopping time Initialization: Theorem 5. The WARL for the N-CuSum algorithm is lower bounded as follows: Proof. It can be shown that S-CuSum i [k] is less than the S-CuSum statistic applied on the whole network, for any i and k. Therefore, Together with Theorem 1, this completes the proof.
To guarantee WARL(τ (b)) ≥ γ, it suffices to choose b such that and b ∼ log γ.
For the asymptotic analysis, we choose the same asymptotic setting as in (27) in Section 4. By choosing b ∼ log γ, we also have (33). We then have the asymptotic upper bound on the WADD for the N-CuSum algorithm as characterized in the following theorem.
Proof. See Appendix D.
The proof of the asymptotic upper bound is similar to that of Theorem 3, but requires a more careful construction. This is due to the dependency between the individual S-CuSum statistics and the partition of the graph G [k] into connected components. The asymptotic universal lower bound in Theorem 2 also applies to the arbitrarily connected network here. We then establish the asymptotic optimality of N-CuSum in the following theorem.

Numerical Results
In this section, we present some numerical results. We start with an example to demonstrate a typical evolution path of the S-CuSum algorithm. We then study a fully connected network with three nodes. We compare the S-CuSum algorithm to a generalization of the multi-chart CuSum algorithm in [36] which stops when at least η local CuSums have crossed their individual thresholds. Finally, we study a network that is not fully connected, a lattice network with 36 nodes. In this example, we also consider a network generalized multi-chart CuSum algorithm, which is to wait until η local CuSums have crossed their individual thresholds simultaneously, and those nodes form a connected subgraph. We compare the generalized multi-chart CuSum, the network generalized multi-chart CuSum, the S-CuSum and the N-CuSum algorithms. For all four algorithms, the communication complexity at each time step is L, since we are considering a centralized setting.
In Fig. 5, we plot the evolution paths of the S-CuSum statistic and all the individual CuSum statistics. We consider a fully connected network with L = 3 and η = 2. We choose f 0 = N (0, 1), and f 1 = N (1, 1). We set ν = {1, 40, 80}. There are in total three phases, depending on the number of affected nodes. In phase 1, i.e., k < 40, only the statistic of CuSum 1 grows with a positive slope, and all the other statistics are small and close to zero. Then, in phase 2, i.e., 40 ≤ k < 80, the statistics of CuSum 1, CuSum 2 and S-CuSum grow with a positive slope. In this phase, the S-CuSum statistic is almost the same as the CuSum 2 statistic, which is due to the fact that the S-CuSum statistic is the sum of the smallest two individual CuSum statistics, i.e. CuSum 2 and CuSum 3. Since the CuSum 3 statistic is small and close to zero, the S-CuSum statistic is almost the same as the CuSum 2 statistic in this phase. Eventually, in phase 3, i.e., k ≥ 80, the statistics of CuSum 1, CuSum 2 and CuSum 3 all increase with a positive slope. In this phase, the S-CuSum statistic is the sum of the CuSum 2 and CuSum3 statistics. Therefore, the slope of the S-CuSum statistic is larger than that in phase 2.
In summary, the S-CuSum statistic is small and close to zero with only one affected node, and gradually grows but with different slopes with two and three affected nodes. Thus, the S-CuSum algorithm is adaptive to the unknown propagation dynamics.
We then study the performance of the S-CuSum algorithm, validate our theoretical assertions, and compare it with a generalization of the multi-chart CuSum algorithm in [36], which stops when   N (0.4, 1). Here we choose ν 1 = ν 2 = 1, and d 2 = 40 to simulate the average detection delay, and choose ν 1 = 1, ν 2 = ν 3 = ∞ to simulate the average run length to false alarm. The plot is averaged over 1000 runs. We plot WADD versus WARL for the S-CuSum algorithm and the generalized multi-chart CuSum algorithm in Fig. 6.
In Fig. 6, the slope of the curve corresponding to the S-CuSum algorithm gradually changes after WADD= 40, which validates our theoretical results in Proposition 1. Furthermore, the S-CuSum algorithm performs better than the generalized multi-chart CuSum algorithm, especially when WADD≥ d 2 . This is because when WADD≥ d 2 , there are three affected nodes. The samples from the third affected node also contain information about whether there are no fewer than η affected nodes (although the local CuSum statistic at the third affected node is not large), and this information is used by the S-CuSum algorithm but not the multichart CuSum algorithm.  We next consider a lattice network with 36 nodes (see Fig. 7). We set η = 4, f 0 = N (0, 1), and f 1 = N (1, 1). To simulate the average detection delay, we assume that the event first affects nodes 14, 15, 16, 22 at time 1, then propagates to nodes 9 and 17 at time 10, and no other node is affected by the event. To simulate the average run length to false alarm, we assume that nodes 14, 15, 16 are affected at time 1, and no other node is affected by the event. We repeat the simulation for 1000 times. As we can see from Fig. 8, the N-CuSum algorithm has the best performance among the four algorithms. Compared to the S-CuSum algorithm, the N-CuSum algorithm has significantly reduced the WADD, which is due to its effective exploitation of the network structure.
Under the same setting used to simulate the average detection delay as shown in Fig. 8, we plot the average number of connected components in G [k] when N-CuSum crosses the threshold b, as a function of the threshold b in Fig. 9. We observe that as b increases, the average number of connected components decreases, and its value is between 1.5 and 4.5. This is because for large b, the unaffected nodes are eliminated with high probability, and the resulting graph G [k] contains mostly the affected nodes.  Lastly, we compare the computational complexity of these four algorithms. We consider the lattice network in Fig. 7, and all nodes are not affected. We run these algorithms for 10000 steps without stopping (b = ∞), and repeat the experiment for 100 times. We run the experiment on a 2.0GHz Intel core i5 CPU using Matlab. The average time consumption for 10000 steps (in seconds) is given in Table 1. We observe that these four algorithms run very fast with 10000 steps; in particular, they all take less than one second. Also, not surprisingly, the algorithms that exploit the network structure consume more computational power.

Discussion
The results in this paper can be easily generalized to the case in which the distributions of the samples are different across the different nodes (heterogeneous sensors). For example, the generalized S-CuSum algorithm for this case is also constructed by comparing the sum of the smallest L − η + 1 local CuSums to a threshold b, and hence can be implemented efficiently. The asymptotic optimality of this algorithm can also be established similarly, but the optimal performance takes on a more complicated form.
In our modeling and analysis we have assumed that only one connected subgraph in the network is affected. In an arbitrarily connected network, if two or more events occur simultaneously, two or more connected subgraphs could be affected. Here the goal might be to detect whether there exists one event that has affected at least η nodes; two small events are not of interest. Then the S-CuSum algorithm is clearly not going to be asymptotically optimal in this case, because it clubs all the affected nodes together ignoring the network structure. The N-CuSum algorithm is also not going to be asymptotically optimal for the following reason. The N-CuSum algorithm may not be able to correctly delete the unaffected nodes, and this could result in bridges between small affected connected subgraphs (with less than η affected nodes) to form a big one (with more than η nodes), thus triggering false alarms. Developing an efficient algorithm to handle the case of two or more events occurring simultaneously is an interesting open problem.
In this paper, the observations are assumed to be i.i.d. before and after the change-point at each node. It is clearly of interest to generalize to the case with non i.i.d. observations using tools described in [5]. Furthermore, we assumed that the data generating distributions before and after a node is affected by the event are known. In many practical applications, this assumption may not hold, and it is of further interest to construct algorithms that do not rely on complete knowledge of the distributions and still provide good performance [39,40]. In this paper, we consider a centralized setting, in which all the samples are available at a fusion center. It is also of practical interest to extend our results to the distributed setting where this is no fusion center and the sensors communicate directly with each other. The adversarial setting is also worth exploring, in which some nodes may be comprised by an adversarial party.

A A Useful Lemma
We recall the following useful lemma, which is a slight generalization of the Weak Law of Large Numbers.
For simplicity, for any > 0, denote By the Markov inequality, It then suffices to show that as γ → ∞.
We denoteν = {1, . . . , 1, ∞, ..., ∞} with the first η − 1 elements being 1, and all the remaining elements being infinity. Clearly, under Pν, there are η − 1 affected nodes. For any stopping time τ that satisfies the false alarm constraint, we have which implies that for each m < γ, there exists some ν ≥ 1, such that This can be shown by contradiction as in [22,Theorem 1].
By a change of measure argument, it follows that where a will be specified later.
The event {τ ≥ ν η } only depends on X[1, ν η − 1], which follows the same distribution under P ν and Pν. This implies that It then follows that Due to the fact that for any events A and B, P(A ∩ B) ≥ P(A) − P(B c ), it follows that where (a) is due to the fact that the event {τ ≥ ν η } only depends on X[1, ν η − 1], which is independent from X[ν η , j], ∀ν η ≤ j ≤ ν η + α γ .
Combining (51) and (52), we obtain By (48), it follows that for m = α γ , there exists ν η , such that We then show the second term in (52) also converges to 0 as γ → ∞. It can be shown that where the last step follows by applying Lemma 1.

C Proof for Theorem 3
By the recursive structure of (W i [k]) + , and the fact that it is always non-negative, and is zero when k = 0, then for a given D, the worst-case of the average detection delay is achieved when ν 1 = · · · = ν η = 1.
where (a) is due to the fact that (W i [k]) + is always non-negative, especially for i / ∈ C(ν).
Define the stopping rule N (b): It then follows thatτ (b) ≤ N (b). Therefore, it suffices to establish an upper bound on the E ν [N (b)].
For simplicity, for any > 0, we denote We then bound P ν (N (b)/α b > ) for every ≥ 1 as follows: where (a) is by the definition of W i [k]; (b) follows by the independency among the random variables: We then bound the first term in (62). It follows that It is clear that for large b, Moreover, i∈C Z i [max{1, ν i }, α b ] is the summation of the log-likelihood ratios of the samples from f 1 . Therefore, for any C ⊆ C(ν) such that |C | = |C(ν)| − η + 1, i∈C Z i [max{1, ν i }, α b ] is the sum of the log likelihood ratio between f 1 and f 0 of at least number of samples generated by f 1 . Then by the Weak Law of Large Numbers, it follows that in probability, where β > 1. Therefore, as b → ∞, Together with (63), this further implies that where δ and δ can be arbitrarily small for large b.
Following similar steps, we can also show that each term in (62) is upper bounded by δ for large b. Therefore, and This implies that Due to the fact that is chosen arbitrarily and δ can be arbitrarily small for large b, as b → ∞, (1)).
This concludes the proof.

D Proof of Theorem 6
By the recursive structure of (W i [k]) + , and the fact that it is always non-negative, and is zero when k = 0, then for a given D, the worst-case of the average detection delay is achieved when ν 1 = · · · = ν η = 1. We use the same notation of α b as in (60). It can be shown that Recall that we assume that ν 1 ≤ ν 2 ≤ · · · ≤ ν L , and for large b, we have (64). We then denote H = {1, 2, . . . , h} as the set of indices of nodes that have changed their distribution by the time α b . For simplicity of notation, we use S-CuSum H [k] to denote the test statistic value of the S-CuSum on H at time k: We next bound P ν (τ (b)/α b > ) for every ≥ 1 as follows: where A[k] and B[k] are two events defined as follows: it then follows that This further implies that We then define It is clear that ∀1 ≤ j ≤ , and A [jα b ] only depends on the samples from (j − 1)α b + 1 to jα b .
We then analyze B[k]. By the assumption that the event propagates along the edges in the network, the sub-graph induced on H is connected. If ∀j ∈ H, W j [k] ≥ log b, then there must exist C i [k] that contains all nodes in H. Therefore, it follows that We then define It follows that ∀1 ≤ j ≤ , and B [jα b ] only depends on the samples from (j − 1)α b + 1 to jα b .
Combining (83) and (88), equation (75) can be further bounded as follows: Following similar steps as from (62) to (69), we can show that for large b, ∀1 ≤ j ≤ , where δ can be arbitrarily small.
Then by the Weak Law of Large Numbers, for large b, we obtain where δ can be made arbitrarily small for large b, and hence so can δ. Therefore, and following steps similar to those in (70) to (72), we conclude the proof.