Change Detection for Large Distributed Sensor Networks With Multitriggered Local Sensors

The emergence of large sensor networks and the Internet of Things has reinvigorated interest into distributed quickest change detection. Important shortcomings of existing approaches are ease of design, flexibility in communication, and applicability to larger networks. The new approach proposed in this work features local sensors that can be triggered multiple times, i.e., can reset and continue monitoring after transmitting their decisions. With larger sensor networks as a focus, the system allows for multiple simultaneous transmissions to a fusion center within bandwidth limitations. The proposed system uses the cumulative-sum procedure at local sensors to binarize local decisions, which are then transmitted to the fusion center that also employs cumulative-sum quickest detection. Test overdesign due to sequential test overshoot is avoided, and global and local thresholds are chosen to meet a desired false alarm rate constraint using numerical computation of expected delay performance. The system compares favourably to several existing methods while offering greater flexibility in the amount of fusion center communication.


I. INTRODUCTION
Continuous advancement in sensor technology and wireless networking has made it feasible to widely deploy distributed decision-making systems [1] allowing for decisions to be made from vast amounts of data with increased efficiency, robustness and accuracy. The Internet of Things (IoT) is an embodiment of this trend that seeks to establish itself in everyday life and is impacting industries such as manufacturing, healthcare, transportation, retail and utilities [2], [3]. A critical consideration in the design of IoT systems leveraging large sensor networks lies in achieving efficient decision fusion [4], [5] in terms of decision performance relative to communication cost.

A. MOTIVATION
In order to realize the potential brought about by distributed decision making systems, renewed attention has been directed towards the application of sequential detection [6], whereby at each time slot, a decision making system either stops and declares a decision on the process observed, or continues to observe measurements. The two The associate editor coordinating the review of this manuscript and approving it for publication was Mohamad Afendee Mohamed . main types of sequential detection problems are hypothesis testing, addressed by the sequential probability ratio test (SPRT) [7], and change detection, addressed by the cumulative sum (CUSUM) algorithm, which can be expressed as a repeated SPRT [8]- [11]. Problems categorized as sequential hypothesis testing seek to distinguish the correct hypotheses consistent with a set of observed measurements. In these types of problems, the objective is to take as few observations as possible and thereby minimize delay while meeting an acceptable prespecified probability of misclassification.
Sequential change detection problems, on the other hand, seek to identify a change of events from a set of measurements. In change detection problems, measurements are typically assumed to be initially generated according to a specified probability distribution. At an unknown time, a change in probability distribution abruptly occurs. Similar to sequential hypothesis testing, tests are designed to determine as quickly as possible when a change occurs are subject to meet some acceptable error rate, i.e., the goal of such tests are to minimize decision delay within a given false alarm constraint.
Specific use-cases of sequential detection procedures within distributed, decentralized systems are in cognitive radio including jammer detection and localization [12], VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ monitoring and data collection systems for critical infrastructure such as power grids, oil and gas networks, roadways [6], malicious user detection for mobility tracking in autonomous driving [13], as well as optimized sensor networks under communication constraints for control applications [5]. In these applications, large geographical areas are monitored, thereby requiring data from a large number of sensors at different locations. Typically, due to bandwidth restrictions, cost or reliability, it is expensive to have each sensor forward its observations to the fusion center (FC) to make a decision requiring the decision making process to be decentralized. [14]. Besides outdoor uses, there is significant potential for wireless IoT networks in a healthcare setting [15], [16]. Distributed sensing can facilitate remote health monitoring, fitness programs, chronic disease management and elderly care. Various medical and non-medical devices, such as sensors installed within the home and imaging are being proposed to be used as smart devices to provide clinicians with valuable information that assist in making insightful decisions about a patient's care. The integration of IoT within healthcare is expected to reduce medical costs, increase quality of life and enrich the user's experience [17].
As evident from the use-cases discussed above, future IoT systems are evolving to employ many sensors. Current approaches to sequential detection in distributed systems do not scale well for systems with many sensors because they neglect test threshold overshoot and exhibit an increase in design complexity. In addition, in certain applications, local sensor data may be continuously received after a local decision is made. This motivates investigating new distributed sequential detection procedures that overcome these limitations.

B. RELATED WORK
Early research first treated local sensors as physical extensions of the FC, enabling detection decisions to be centralized. Extensions to monitoring the quickest detection of changes appear in [18] and [19]. Specific solutions have also been proposed e.g., two sensors [20], or based on nonparametric statistical models [21]. Alternatively, [22] addresses a decentralized quickest detection scenario where only an unknown subset of local sensors experience the change in distribution. The motivation behind distributing decision-making to local sensors is, in large part, due to bandwidth constraints, as wireless spectrum is a limited resource, as well as energy constraints [23]. An extreme example is the fully decentralized ad hoc IoT network without a FC which was proposed for sequential change detection in [24], [25].
Bayesian formulations to optimize the distributed quickest change detection problem have been investigated for certain cases. Examples include the ad hoc network in [24], as well as [26] where communication is highly restricted wherein only a single report can be received at the FC in each time slot. Additional reports are ignored by the FC discarding valuable information. The approach in [19] assumes a known change-time distribution as well as full feedback from the FC to local sensors, which is resource demanding. Bayesian quickest detection also has practical issues that limit applicability. First, required a priori probability of change as well as a priori change-time distribution are often not available. In addition, the dynamic programming solution to Bayesian formulations has exponential complexity as well as very high sensitivity to the assumed a priori probability of change.
Alternatively, minimax quickest change detection criteria have led to the CUSUM algorithm [8]- [11] with optimality for conventional sensing, although globally optimal minimax solutions have eluded distributed networks thus far. Several minimax approaches have been shown to possess desirable asymptotic properties. In [27], a decentralized solution uses CUSUM asynchronously at local sensors. A one-shot scheme that informs the FC of a change after the first local sensor decision is made, also known as minimal combining, is shown to perform well asymptotically as the mean time between false alarms tends to infinity. However, its performance advantage is less clear at specific false alarm rates, and explicit design procedures to satisfy maximum system error rates were not provided. Recently, Biswas et. al. consider a non-Bayesian quickest change detection approach that addresses quantization in transmission as well as evergy harvesting [28].
In [29], a simultaneous change in distribution across all local sensors is monitored. Two one-shot procedures that utilize CUSUM at local sensors are proposed and analyzed. In the first, each local sensor may communicate with the FC only once with no test reinitialization. The FC concludes a change has occurred as soon as m local CUSUM messages out of a total of K local sensors are received by the FC. The second procedure considered makes a change decision when m out of K local sensor CUSUMs agree simultaneously that a change has occurred [29]. For each case, non-asymptotic lower bounds on expected time to false alarm, and asymptotic upper bounds to worst case detection delay are obtained. Its limitations are (1) that it lacks a design method to achieve a given false alarm rate, (2) local decisions are done on a oneshot basis, and (3) its performance has not been investigated for more than 4 sensors.
In the systems considered in [30] and [31] local sensors process observations using local log likelihood ratio statistics, similar to the SPRT, with a set of predefined constant thresholds. A single bit is transmitted to the FC when the local statistic crosses above or below the set of thresholds, which is termed level triggering. The FC uses a SPRT on the asynchronously arriving bits. Through an asymptotic performance analysis, the performance in using additional bits to encode the sequential test overshoot at local sensors is assessed. A threshold design procedure to achieve a given error performance was not presented. Instead, offline simulation is employed.
In each of [26], [27], [29]- [31], the scalability to networks with large numbers of sensors is challenging. With the exception of [30], the occurrence of sequential test overshoot at local sensors is ignored. In networks with many local sensors, the accumulated impact of overshoot can be significant. System designs [32] and [33] account for local sensor overshoot but address the distributed hypothesis testing problem rather than quickest change detection. In [32], overshoot is encoded into a time delay between a local sensor's decision time and its report transmission time. This time delay is then decoded by the FC to determine the amount of overshoot. While accounting for overshoot, this scheme adds delay to the system. In [33] quantization strategies used by local sensors to summarize their observations are examined. Such strategies may have potential for capturing overshoot in summary messages, but complicated quantization schemes incur extra overhead and complexity for lowpower sensors. The issue of how to efficiently scale up these quantization strategies for large sensor networks was not addressed.
The more recent proposed approaches are categorized in summary form in Table 1. Not all address quickest change detection as opposed to sequential hypothesis testing [30], [31] or decision fusion [4]. These previous schemes have local sensors that are one-shot, i,.e., shut off after they make a decision and ignore future local observations. The system in [24] is distinct by not having a fusion center. Both [26] and [24] have the drawbacks of requiring assumed a priori change time statistics. None of the networks for quickest change detection permit simultaneous reports from multiple local sensors. Most previous approaches are motivated by asymptotic properties rather than explicit design procedures with finite sample analysis to determine suitable sensor thresholds. Given a maximum error constraint, test thresholds that closely meet this constraint are desired. Over-and under-designed tests are especially problematic in large networks, where impacts on performance accumulate. Finally, none of the approaches listed in Table 1 are tailored specifically to bandlimited fusion center based networks with large numbers of local sensors.

C. CONTRIBUTIONS AND ORGANIZATION
This paper fills a gap in the literature by addressing the design of minimax quickest change detection based on independent observations for distributed systems with large numbers of sensors that can each be reset multiple times and transmit multiple reports to a FC. By extending direct numerical performance analysis methods, joint local sensor and FC test thresholds can be designed to closely meet overall specifications. A key element of the direct computational approach is the incorporation of the effect of overshoot that occurs for resetting CUSUM tests on observations at local sensors as well as overshoot for CUSUM tests on reports at the FC. In addition, direct performance analysis is applicable to arbitrary discrete probability density functions to accommodate desired quantization schemes besides the binary one proposed here. The main contributions are summarized as follows:

1) By generalizing a recursive method proposed for
Gaussian-based SPRT design in [34] and [35], numerical techniques are presented to analyze the performance of SPRT, CUSUM and repeating-CUSUM procedures that account for test overshoot, greatly improving accuracy over Wald's approximations. These are applied in tandem to continuous distributions at local sensors which then feed sensor reports to the FC that obey discrete distributions. 2) The proposed distributed system is based on per-node optimal minimax quickest change detection that allows for sensors that reset and continue monitoring after transmitting their local decisions within an overall bandwidth restriction at the FC.
3) The false alarm probability and average delay performance analysis is applied to test threshold design to create a methodology that trades off delay and bandwidth while closely meeting a specified overall false alarm rate constraint.
The rest of the paper is organized as follows: Section II presents the system model and formulates the minimax distributed quickest change detection problem. Section III presents numerical techniques used to analyze the performance of the SPRT, CUSUM and distributed systems. Sections IV and V present a distributed, decentralized system based on a minimax (CUSUM) quickest change detection framework with the restriction of FC communication bandwidth. Section VI presents a threshold design methodology for the proposed system. In Section VII, the accuracy of the design method, performance tradeoffs, and comparisons with existing systems are quantified.

D. NOTATION USED
By default, capital letters, X , denote random variables and associated probability density and cumulative distribution functions are described by f (x) and F(x), respectively, whether discrete or continuous, where x is a realization of X . Subscripted random variables by lower case k denote discrete time. On occasion probability distributions under VOLUME 10, 2022 hypothesis H 0 and H 1 are referred to as Q 0 and Q 1 , respectively. Observations at local sensors are typically continuous-valued whereas reports observed at the fusion center are discrete-valued, e.g., r k is a realization of the number of reports observed by the fusion center at time k. The expressions P θ [A] and E θ [A] denote conditional probability and expectation of event A, given parameter θ. Superscripts are used only when absolutely necessary to distinguish time and sensor: X i k refers to a random variable observed at sensor i at time k. Otherwise sensor indices are dropped. All other specific functions are defined where used.

II. SYSTEM MODEL
It is assumed that observations between sensors are mutually independent conditioned on hypothesis H 0 or H 1 . This would occur, for example, when sensors are physically well separated in space. Observations at the same sensor are also assumed to be independent and identically distributed (IID) before and after a single abrupt change from H 0 to H 1 , i.e., that they are monitoring a statistically homogeneous environment under H 0 . At some point in time, the observations at all sensors abruptly and simultaneously change from H 0 to H 1 .
This change can be described as where represents an unknown deterministic change time, and Q 0 and Q 1 are the probability distributions before and after the change. A CUSUM test, with known optimality properties [10], is performed at local sensors. In the following, when referring to an arbitrary sensor, superscript notation is omitted. When a local sensor decides that a change in distribution occurs, it transmits a binary-valued report indicator to the FC, which, upon collecting information from local sensors, acts as the final global decision maker. Over time, the FC thus observes a sequence of numbers of reports, r k ∈ {0, 1, 2, . . . , C}, where a value of r k = j means that j sensors have made a decision and sent reports to the FC in time slot k. If a report is not received, the value of r k is zero. To reflect network bandwidth limitations, the number of reports received at a given time slot is limited to a value no larger than C, the number of available channels, where C ≤ M . The time sequence of reports r k , at times k = 0, 1, 2 . . ., updates a cumulative statistic at the FC.
Let the FC's expected decision delay and false alarm rate be expressed, respectively, as where τ F is the FC's decision time, is the actual change time, (x) + = max(x, 0), and ARL-to-FA is the average run length to false alarm. The FC's decision policy, δ, aims to minimize expected decision delay subject to a false alarm constraint: where α FC is the desired maximum rate of false alarms. A network architecture is assumed where local sensors only have one-way error-free communication with the fusion center and no communication with one another.

III. NUMERICAL COMPUTATION OF FINITE-SAMPLE SPRT AND CUSUM PERFORMANCE
In order to analyze the delay and false alarm performance of CUSUM tests used in the distributed sensing system as a function of time, core recursive performance computations are first introduced for the SPRT, which satisfy a Fredholm equation [35], and are generalizations of expressions found in [34], [36]. Unlike the traditional Wald approximations [37], these expressions account for overshoot in the design of test thresholds. These expressions are then extended to analyze CUSUM tests.

A. SPRT DIRECT ANALYSIS
Consider the sequential probability ratio test (SPRT) for binary hypothesis testing for IID observations proposed by Wald [38] with well-known optimality properties [7]. This test tracks the likelihood ratio of observations X i observed sequentially at times i = 1, 2, . . . by the accumulating likelihood statistic at time k, where per-sample log-likelihood ratio L i ≡ ln( q 1 (X i ) q 0 (X i ) ), for probability density functions (or probability mass functions if discrete), q 0 ∼ Q 0 and q 1 ∼ Q 1 . This test, a realization, g k , of G k is compared with two precomputed thresholds a and b after each observation. The SPRT decision rule is as follows [37]: and continuing to the next stage indefinitely until a decision is reached.
The objective is to determine the probabilities of error of (8) as well as the expected number of samples required to reach a decision. The notation f (·) refers to the common probability density function (PDF) (or probability mass function (PMF) if discrete), assuming it exists, of the L i , which form a statistically independent sequence in time.
Without loss of generality, in the following it is assumed that H 1 is true. If H 0 is true, the derivation is similar. Denote f k (x) as the conditional PDF (or PMF) of SPRT statistic (8) in stage k given that the test reaches stage k. Noting that the test reaches stage one with probability one, the probability of reaching stage k can be expressed in terms of the conditional probabilities where k ≥ 2. From (8) and the definition of f k (x), where measure notation µ(·) may refer to Riemann integration or summation, depending on whether f k (x) is continuous or discrete, and where denotes convolution and the initial term of Eq. (11), f 1 (x), is the PDF of a single log-likelihood ratio observation. This follows from that fact that the PDF of a sum of independent random variables is a convolution. Eq. (11) performs the normalization transformation required so that f k (x) is a PDF after truncation. In summary, the probability of reaching each SPRT stage is obtained by successive convolutions using numerical integration, and integrating the result over the region between the SPRT thresholds. The probability of choosing H 1 at stage k given that stage k is reached is computed as follows: using (11), where F(·) denotes the cumulative distribution function of f (·). Using (9) and (13), the probability of choosing H 1 at any stage is then The average test length of an SPRT given H 1 , E 1 [T ], can be straightforwardly shown to be When (14) is truncated to a finite number of stages, N , a lower bound for P(choose H 1 ) is obtained. An upper bound can be obtained by adding the probability of reaching stage N + 1 to this These bounds can be used to determine at which stage it is appropriate to terminate execution of the recursive procedure. A value of N is found that, if exceeded, will insignificantly affect the computed delay and error probabilities. In numerical computations, a value of N is selected such that continuing on to stage N + 1 would change the computed error probability by less than three orders of magnitude. The magnitude difference between delay stages is defined as and this method of termination is used in Algorithm 1.

B. CUSUM DIRECT ANALYSIS
The above analysis procedure can be extended to analyze the CUSUM test for change detection that tracks statistic S k , the maximum of log-likelihood ratio tests for all possible change times up to time k, starting at S 0 = 0 until it exceeds CUSUM log threshold h. Page [8] showed that S k can be expressed as a repeating sequence of SPRTs with boundaries 0 and h. Each time the SPRT exceeds the lower boundary, it is reset to zero. This can be seen by rewriting (17) as Two different types of methods are proposed for analysis:

1) ONE-SIDED SPRT CUSUM ANALYSIS
A CUSUM test has lower and upper log-thresholds 0 and h, respectively, where detection threshold h is set by the system designer. The CUSUM test can be analyzed using a similar procedure as was shown for the SPRT in Section III-A. As in the case of the SPRT, the PDF of the CUSUM statistic in stage k given that it reaches stage k can be computed using the fact that the PDF of each log-likelihood ratio observation is IID.
A CUSUM test will only terminate if the statistic S k exceeds threshold h. The expression in (10) becomes P(test reaches stage k + 1 | test reaches stage k) VOLUME 10, 2022 where f k (x) is given by the convolution in Eq. (11) with When a CUSUM test statistic at time, or test stage k, S k , falls below the lower threshold 0, the test reinitializes itself by setting S k = 0 and continuing operation. At CUSUM time (or stage) k, the probability that the test resets, is added to f trunc k (0) of the current stage's truncated PDF prior to the next observation. This simple computation accounts for reinitialization in the subsequent CUSUM stage's PDF, which motivates this direct numerical computation approach. The probability of choosing H 1 in stage k + 1 given that stage k + 1 is reached and the decision delay, or average run length given either hypothesis are similarly found using (13) and (15) with this new PDF. The unconditional probability of accepting H 1 at a given stage k is computed as The above CUSUM analysis is illustrated by an example of an IID standard Gaussian shift-in-mean from zero to one, i.e., from N (0, 1) to N (1, 1). The probability of accepting H 1 at a given test stage is computed using the proposed procedure, Eq. (22), and compared to a Monte Carlo simulation of CUSUM in Figs. 1 and 2. Figure 1 shows close agreement between the numerical computations and Monte Carlo simulation over the plotted 38 delay stages when a change to H 1 occurs immediately at delay stage 0. Figure 2 shows the probability of CUSUM false alarm, i.e., when H 0 is true. Statistical agreement between Monte Carlo and numerical results is observed to within a 90% confidence interval, given by where P ≡ P(choose H 1 at stage k) in Eq. (22) and W is the number of Monte Carlo trials [39]. In Figure 2, a total of 10 6 Monte Carlo trials are used over the 55 delay stages, resulting in W = 10 6 55 . A value of P = 0.005, for example, corresponds to CI 90% = 8.6 x 10 −4 which encompasses the Monte Carlo curve's oscillations.

2) REPEATED SPRT ANALYSIS
The CUSUM test (17) may be alternatively expressed as a sequence of repeating SPRTs [8]. This suggests a method for estimating a CUSUM's average run length (ARL) to false alarm and its ARL to detection.
Consider a SPRT with initial statistic G k = 0 and lower and upper log-thresholds 0 and h, respectively. Let P θ (H 1 ) represent the probability that an SPRT terminates at upper threshold h when H θ is true, θ ∈ {0, 1} and let E θ [T 0 ] and E θ [T h ] denote the average numbers of samples for a single SPRT conditional on the test ending on 0 and h, respectively, given H θ . Page [8] showed that the ARL of a CUSUM test given H θ being the true hypothesis is In (24), P θ (H 1 ), θ = 0, 1 represent the probability of false alarm and correct detection of a SPRT, respectively. Eq. (24) provides the ARL to false alarm and detection, respectively. The numerator of (24) is computed using (15) and the denominator is computed using (14). In Figures 3 -6, the analytical ARL to false alarm and detection of CUSUM tests are shown, respectively, for varying detection thresholds and compared to Monte Carlo simulations of CUSUM for Gaussian shift in mean, Gaussian shift in variance, Bernoulli, and the discrete PMF described in the caption of Fig. 6. In each case, the ARL values from the proposed numerical computations show much closer agreement to those obtained from Monte Carlo simulation than those resulting from the use of [40] where µ θ = E θ [L 1 ] for θ = 0, 1 which are derived from the well-known Wald approximations that neglect overshoot and undershoot from test threshold boundaries [37]. It is worth noting that staircase discontinuities observed in Figures 5 and 6 are due to discrete distributions (PMFs), which are encountered at the fusion center described later.

C. APPLICATION TO DISTRIBUTED SYSTEMS
The CUSUM analysis of Section III-B can be applied to the decentralized systems described in Section II. In such systems, local sensor detection procedures offload FC processing by only transmitting detected changes to the FC. The FC instead receives indicator reports from the M sensors and makes a final decision on the state of the system. When a local sensor sends a report, rather than shutting off, the local sensor is assumed to reinitialize itself and continue processing observations since they are are still arriving. The previous CUSUM analysis can be easily extended to  analyze local sensor CUSUM procedures that reinitialize after reporting. The focus here is on the case of reinitialization VOLUME 10, 2022 to zero, but a similar procedure follows for any reset strategy.
To extend the one-sided SPRT CUSUM analysis of the preceding section, the reinitialization probability P(reinitialize to zero at stage is added to the PDF at f k−1 (0) prior to observing the next observation. This allows for the possibility of a local sensor's CUSUM terminating and reinitializing itself to zero to be accounted for in the subsequent stage's PDF. The probability of choosing H 1 in delay stage k given that stage k is reached, γ k (H θ ), is then computed using Equation (13) for a CUSUM test that after each observation has some probability of transmitting a report and reinitializing itself to zero. The analysis of the repeating local CUSUM is summarized in Algorithm 1. Algorithm 1 is applied to compute the probability mass function of received reports at the FC as follows: define the term P(r k+1 = n|H θ,k+1 ) θ ∈ {0, 1} as the probability of receiving n reports simultaneously at the fusion center given hypothesis θ at time slot k +1. Under steady-state conditions, and when n M , these probabilities can be approximated as if k = 1: f trunc k−1 (0) = f trunc k−1 (0) + P(reset at stage k-1) +P(reinitialize to zero at stage k − 1) 16: 19: if P(accept H 1 at stage k)+P(reach stage k + 1) P(reach stage k + 1) ≥ : The steady-state condition approximation is due to ignoring the small subset of sensors that may have recently reset after having reported H 1 to the FC. The resets introduce time-variation to the FC report sequence, r k , k = 1, 2, . . . violating the IID assumption implicit in (28). In the scenarios of interest where numbers of sensors, M , are large and false alarm rates are low, the number of simultaneous reports, n M , and the low proportion of reset sensors therefore has a negligible effect on the overall reporting probabilities. This will be investigated later on.
From (28), the probability that the number of received reports exceeds a specified bandwidth constraint of C ∈ [1, M ] sensor-to-FC communication channels can be estimated.
To illustrate the above, Figures 7(a)-(c) use (28) to plot the probabilities of the FC simultaneously receiving one, two, and three reports, respectively, given H 1 for a Gaussian shift-in-mean scenario. Fig. 7(d) plots the probability of receiving one report given H 0 . These are compared against Monte Carlo simulations of the 13-sensor distributed system. These probabilities are plotted as a function of absolute time, k, from the start of system operation. The change is assumed to have occurred at time k = 0 under H 1 , and to have never occurred under H 0 . It is observed that after an initial transient period where the values of FIGURE 7. Comparison between numerically computed and Monte Carlo generated P(r k+1 = 1|H 1,k+1 ), P(r k+1 = 2|H 1,k+1 ), P(r k+1 = 3|H 1,k+1 ) and P(r k+1 = 1|H 0,k+1 ) for N(0, 1) vs N(0.5, 1) with h l = ln (15), and M = 13. P(r k+1 = n|H θ ) are time varying, a steady state value is quickly reached. Transient periods either occur immediately after a change between H 0 and H 1 as in Fig. 7(a)-(c) as well as at the start of system operation as in Fig. 7(d). As expected, under conditions where numbers of simultaneous reports are small relative to the number of sensors, the frequency of occurrence of the transient reset periods is negligible and the proposed numerical computations closely approximate the reporting probabilities of local sensors observed by Monte Carlo simulation.

IV. LOCAL SENSING ALGORITHM
To design the distributed decentralized system, both FC and local sensors need to ensure that the global false alarm rate constraint is satisfied. Local sensors also require a decision policy to minimize delay in detecting a change in distribution. Without specifying an a priori change time distribution, the CUSUM algorithm (17), with its minimax optimality [10], and simplicity by having a single design threshold parameter, h l , is proposed for local sensing. If the test statistic S i k exceeds this threshold at time k, a binary report is generated by local sensor i and transmitted to the fusion center, The generation of one-bit local sensor reports quantizes observations accumulated by local sensors into indicators of whether a local sensor CUSUM has terminated at each time k. Dropping superscripts, when an arbitrary local sensor's CUSUM terminates, it transmits a report to the FC and its statistic, S k , is reinitialized to zero. It is worth noting that reinitialization to S k = 0 favours H 0 a priori, while reinitialization to S k = h l favours H 1 a priori. Reinitializing to S k = 0 is therefore the most conservative reinitialization strategy for detecting a change, resulting in the smallest possible FC false alarm rate and the largest FC detection delay among all other possible resetting strategies.
It is also noted that from CUSUM properties, the design choice of reinitialization to S k = 0 is equivalent to performing likelihood ratio tests for all possible change times in the future, i.e., a union of likelihood ratio tests, which serves as a method to guarantee a lower bound on the ARL to FA when designing local sensing tests.
Computationally, each local sensor, in deciding to generate a report via Eq. (29), computes Eq. (18) for each time slot, with per-sample likelihood ratio computable by a simple table lookup. Thus, the per-sensor computation is minimal. Moreover, the frequency of communication incurred when r i k = 1 can be adjusted by h l based on performance requirements by the proposed test design, which is also dependent on the fusion center algorithm, described next.

V. FUSION CENTER ALGORITHM
At the fusion center, local sensor reports are received sequentially in time. At each time slot, the fusion center receives r k = M i=1 r i k ∈ (0, 1, 2, . . . , C) local sensor reports, where C ≤ M is the maximum number of simultaneous reports allowed in the system. When the report generation policy at local sensors is fixed, the numbers of reports arriving at the FC can be described by a probability mass function (PMF) under H 0 or H 1 .
The case of bandwidth constraints preventing r k+1 ≥ C reports from being received at the FC in a single time slot requires simultaneous arrivals of C to M reports to be capped at C. This constraint is realistic for large sensor networks, VOLUME 10, 2022 where the number of reports received in a single time slot could exceed the communication bandwidth to the FC.
In Section III-C Algorithm 1 was proposed to compute the probabilities of receiving multiple reports at the FC of a distributed system based on a repeating CUSUM with reinitialization to zero after choosing H 1 . These probabilities, though time varying, approximated steady state values quickly after startup and after a change in distribution.
The repeating CUSUM analysis method, Algorithm 1, is also applied to compute PMFs describing report arrivals at the FC, denoted by R θ (k, n), θ ∈ {0, 1} for H 0 and H 1 , using the steady state approximation (28) whenever the number of simultaneous reports is small compared to M , and assume that the probabilities are zero (negligible) otherwise. Specifically, let J be a constant chosen such that J M , e.g., J < M /5. The steady-state report arrival PMF is expressed as where θ ∈ {0, 1} and normalizing factor N 0 = J i=1 P(r k+1 = i|H θ ). It is worth noting that the extreme case of a bandwidth constraint of C = 1 on the system results in a Bernoulli distribution.
Reports arriving at the FC described by the above PMFs are processed using CUSUM, with fusion center test statistic and the FC makes the decision that a change has occurred at the first time k where the realization of observed reports, z k , of Z k in (31) exceeds FC detection threshold h f . This procedure is described by Algorithm 2, which also includes local sensing. A remark here is that computationally, at each time slot, the fusion center algorithm accumulates the total number of received reports up to the network's bandwidth limit and performs the CUSUM test Eq. (31), which can be performed recursively in a similar manner to the local tests via Algorithm 2, Step 9.
Threshold Design Tradeoffs: In order to determine FC CUSUM threshold h f , the repeated SPRT analysis from Section III-B2 can be applied. Given a (h l , h f ) (local sensor, global FC) threshold pair, the FC's expected detection delay and false alarm rate are then determined, noting that the report arrival PMFs under H 1 and H 0 input to the FC are determined by h l . In principle, therefore, a target global system false alarm rate can be achieved by iteratively adjusting h l and/or h f . The 2 degrees of freedom to specify overall false alarm rate enables the following flexibility in system design: when h l is small, communication cost and bandwidth is cheap and the system behaves like a centralized detector (with quantization), performance is determined by h f and detection delay would be low. Alternatively, h l set to a large if g i k ≥ h l : 7: n = n + 1 % add report 8: end for 9: z k = (z k + ln( R 1 (k,n) R 0 (k,n) )) +

10:
k = k + 1 % step time 11: end while 12: % where (y) + ≡ max(y, 0) value corresponds to high bandwidth cost and, if sufficiently large, results in or-ing (fully distributed) detection behaviour with identical global and local false alarm probabilities and maximum detection delay.

VI. SYSTEM DESIGN PROCEDURE
Consider a distributed system with M sensors. The system design requires that R FC FA ≤ α FC , a given FC false alarm probability, with the objective of minimizing delay E FC DD . In other words, the design objective, (5) becomes For M sensors, α FC can be related to a local sensor minimum false alarm rate constraint, α l , via To avoid over design and keep delay minimum, local sensors should achieve a local false alarm rate R l FA no less than the constraint, i.e., If a local sensing threshold h l is chosen such that R l FA ≈ α l in (35) the FC should terminate and choose H 1 after only one report in order to ensure R FC FA = α FC . This value of h l serves as the upper bound. For h l that result in R l FA > α l , more than one report is required to be processed before choosing H 1 at the FC in order to satisfy R FC FA = α FC . If a value of h l is selected that causes local sensors to violate (35), then for any choice of h f the proposed distributed system will be over designed in terms of false alarm rate, resulting in greater delay.
A suboptimal solution is therefore proposed: given the value of α l from (34), the maximum possible value of h l that satisfies (35) is computed using the CUSUM analysis of Section III-B2 within a discretized search procedure. Once this value of h l is determined, its associated reporting PMFs are computed using Algorithm 1 and then the FC threshold h f is determined that designs the system to achieve the closest R FC FA to α FC as possible by treating it as a shift in PMF test, as described in Section V. A 1-D search is then performed as follows: compute and store the E FC DD and R FC FA of the given (h l , h f ) pair, decrease h l and repeat.
The choice of a final (h l , h f ) pair to design local sensors and the FC is based on system requirements. For example, if the only design goal is to minimize expected detection delay while meeting a maximum false alarm rate, the threshold pair with minimum E FC DD is selected. If, for example, energy consumption at the sensors is a secondary concern compared to bandwidth within the network, it may be desirable to reduce the number of report transmissions between local sensors and the FC. In this case, the threshold pair with largest possible h l is chosen so that the reports are most informative, and thus fewer reports are needed for a decision.
Given a system with M sensors, C parallel channels and false alarm probability constraint α FC , the design procedure can be summarized by the following steps: 1) Using Eq. (34), convert α FC to a lower bound on the local sensor false alarm rate constraint, α l . 2) Until R l FA is sufficiently close to α l , compute R l FA for different local sensor thresholds, h l , using the procedure in Section III-B2. The result is the maximum value of h l to be used in Step 3. 3) With the current h l , perform Algorithm 1 and (28) to determine the reporting PMFs at the FC, R 0 (r k ) and R 1 (r k ), r k = 0, 1, · · · C, corresponding to H 0 and H 1 , respectively. 4) Solve FC report arrival discrete shift in PMF problems: Until R FC FA is sufficiently close to α FC , vary h f and compute E FC DD and R FC FA using PMFs R 1 (r k ) and R 0 (r k ) from Step 3 using the analysis in Section III-B2. 5) Record (E FC DD , R FC FA ) for threshold pair (h l , h f ). 6) Decrease h l and go to Step 3. It is worth remarking that the proposed system design has the following advantages in scaling to large sensor networks over existing approaches: • The CUSUM analysis of Section III-B accounts for sequential test overshoot in the design. This accuracy avoids over designing error specifications which result in excess detection delay. As the number of sensors, M , in the distributed system increases, the larger number of overshoot events has a cumulative effect.
• The design complexity is determined by the desired accuracy of the convolutions, which is proportional to the square of the number of bins in the discretized PDFs, typically 100, multiplied by the number of stages, N in Eq. (16), with typical values in the range from 10-50. Thus, test design does not increase appreciably with network size M . Increasing M only increases the complexity of computing Eq. (28), which can be approximated for large M by a closed form expression using Stirling's formula.

VII. NUMERICAL RESULTS
The proposed test design procedure for distributed detection presented in Section VI is assessed for its accuracy of expected detection delay and false alarm rate in Section VII-A. Trade offs among choices of fusion center thresholds, number of sensors and numbers of parallel channels (bandwidth) are provided in Section VII-B, and comparisons to existing schemes are discussed in Section VII-C. Application to change-in-variance detection is presented in Section VII-D, and performance in a multiple-alternative change scenario is presented in Section VII-E.

A. DESIGN ACCURACY
To illustrate the accuracy of the proposed design procedure, a Gaussian shift in mean test of N (0, 1) vs N (0.5, 1) is investigated under different combinations of parameters h l , M , and C. Each combination resulted in a unique report arrival PMF from local sensors which are computed using Algorithm 1. The procedure in Section III-B2 was then used to compute E FC DD and R FC FA for each candidate threshold h f . Monte Carlo trials simulating the distributed networks were performed to evaluate the accuracy of the proposed procedure in predicting E FC DD and R FC FA . This comparison is shown for 25, 15, and 7-sensor networks and for a bandwidth limit of C = 3 simultaneous reports in Figures 8, 9, and 10, respectively. Closer agreement is observed between the proposed analysis and Monte Carlo simulation in Fig. 8 for M = 25 sensors compared to Figs. 9 and 10 for M = 15 and 7 sensors, respectively. This is expected since the discrepancy is due to the previously described steady-state approximation in Section III-C, noting that for M = 25 there VOLUME 10, 2022 is a lower occurrence of reset transients in the report arrival process. Examining the performance tradeoffs among these three scenarios, increasing the numbers of sensors from 7 to 15 to 25 decreases expected delay from 8.5 to 5.0 to 3.9, respectively at a log false alarm rate of −3.5, or 3.16 × 10 −4 .
Significant staircase discontinuities mentioned in Section III-B2 are also observed in all these results arising from the discrete report arrival distributions (PMFs) with at most C + 1 non-zero terms. When C is small, the discretization of observation distributions causes discontinuities in achievable false alarm rates for FC design under different values of h f . This is clearly seen in Figs. 8-9. At discontinuities, randomization could be used at the FC detector in order to achieve intermediate desired false alarm rates. The largest discontinuity in these scenarios occurs at low h f values, corresponding to FCs that terminate after one report versus those that terminate after more than one report.

B. BANDWIDTH / DELAY PERFORMANCE TRADEOFFS
The performance of the proposed distributed quickest detection framework as characterized by expected detection delay when achieving a given false alarm rate is investigated in Figure 11 that considers an N (0, 1) vs N (0.5, 1) test with M = 13 sensors as a function of bandwidth constraint C. The difference in system performance between C = 1 versus C = 2 is significant since two reports are often received simultaneously at the FC in this scenario, and there is noticeable gain in the proposed framework that exploits distinguishing between numbers of reports being received.
The difference between C = 2 and 3, in contrast, is marginal because of the rarity of receiving three reports simultaneously for this scenario. From this analysis, it can be concluded that a bandwidth constraint of C = 2 is sufficient to obtain most of the performance benefits. A greater improvement from C = 1 to 2 is observed as the false alarm rate constraint R FC FA decreases. A tighter false alarm rate constraint results, on average, in a greater number of reports required by the FC to decide on the change to H 1 . Figure 12 shows results for the scenario described above as the number of sensors, M , is varied with bandwidth fixed at C = 3. As expected, for a given false alarm rate, increasing the number of sensors reduces delay due to the greater observation rate. These differences magnify as the false alarm rate decreases. For example, at a global false alarm rate of 10 −3 increasing the number of sensors from 4 to 13 can reduce expected delay by a factor of nearly 3.

C. COMPARISONS TO EXISTING SCHEMES
The proposed distributed detection scheme is not directly comparable to others in the literature due to differing network and/or communication assumptions between local sensors and FC. A recent approach for a similar type of sensor network is the one-shot type that has severe communication constraints. In Fig. 11 are Monte Carlo simulation results for the m-th alarm scheme proposed in [29] that makes an H1 FIGURE 11. Delay performance of decentralized CUSUM system for N(0, 1) vs N(0.5, 1) with h l = ln (15) and M = 13 sensors as a function of false alarm rate under varying bandwidth constraints C as well as comparison to the m-th alarm scheme for m=1,2 and 3 and majority consensus schemes [29].

FIGURE 12.
Delay performance of the decentralized CUSUM system for N(0, 1) vs N(0.5, 1) with h l = ln (15) with different number of sensors as well as in comparison to First Alarm and Majority Consensus schemes with M=4 sensors in [29] vs false alarm rate.
decision upon receipt of m local CUSUM change detection reports at the FC for the case of M=13 sensors. In Fig. 11, the same false alarm rate results in m-th alarm's significantly higher expected delay curves as shown for m = 1, 2, and 3. This is expected since m-th alarm minimizes total number of reports transmitted to the FC whereas the proposed scheme limits communication bandwidth (available channels), i.e., number of simultaneous reports rather than the total number.
For the case of fewer sensors, M = 4, Fig. 12 reveals that for both the m-th alarm and majority consensus schemes in [29], expected delays are somewhat higher than those of the proposed scheme. This would be expected since communication cost limits performance more significantly as the number of sensors increases. Previously proposed one-shot schemes are also limited by less information. Upon decisions, local sensors are shut off in [29] whereas the proposed scheme utilizes observations that continue to arrive at local sensors. Local sensing can instead be viewed as filtering and/or quantization. An alternative strategy is centralized detection where all information is sent to the FC. A centralized system with CUSUM FC (not shown in Fig. 11) has expected delay ranging from 1.8 to 2.5 samples for log 10 (R FA ) in the range -2.0 to -3.0 [41], far less than the proposed scheme. This is expected since centralized detection does not reduce information communicated from local sensors to FC.
In summary, the proposed approach allows for flexibility in communication, controlled by design thresholds h l and h f . A combination of large h l and small h f yields less communication and greater expected delay at a given overall false alarm rate, resulting in performance similar to methods in [29]. Conversely, small h l and large h f increase communication and lower expected delay, with performance approaching that of a centralized system with no communication restriction.

D. CHANGE-IN-VARIANCE SCENARIO
Next, the same design procedure is applied to the problem of detecting a change in variance from 1 to 2 of a zero-mean Gaussian signal using a 25-sensor network under a bandwidth restriction of 3 simultaneous reports. In contrast to a shift in mean of a deterministic signal featured in the previous section, this scenario models an inherently stochastic signal. This represents detection of a change in observed energy, which may model signal prediction error in monitoring the behaviour of a stochastic system with independent prediction errors. As observed in Fig. 13, Monte Carlo simulation closely matches the analytical performance of the proposed design. As expected, the low value of C = 3 reports results in significant discretization (staircase) effects on the achievable false alarm rates for the reasons explained at the end of Section VII-A.

E. MULTIPLE-ALTERNATIVE SCENARIO
To illustrate a multi-alternative change scenario, shown in Fig. 14 is the performance of a change-of-variance test of N (0, 1) versus N (0, θ) for θ ∈ [2, 3] with M = 25 sensors and a limit of C = 3 simultaneous reports. The test thresholds h l = ln(60) and h f = ln(450) were obtained by designing for the worst-case false alarm value corresponding to θ = 2, though other possible criteria based on the set of H 1 hypotheses may be used. Note that the false alarm probability in this example is unaffected by θ, and the design for θ = 2 meets the specifications for θ > 2 alternatives. As illustrated by the agreement with Monte Carlo simulation, the proposed analysis is applicable to hypotheses with arbitrary PDFs that are independent over time that may not match ones used in the nominal design.

VIII. CONCLUSION AND FUTURE WORK
A distributed quickest change detection system is proposed that features sensors that reset after making decisions to process future data rather than operating on a one-shot basis. The proposed minimax approach, based on CUSUM for independent observations, is shown to provide greater communication constraint flexibility than existing systems such as one-shot or centralized detection. Performance analysis applied to test design closely meets global false alarm specifications while satisfying bandwidth restrictions on numbers of simultaneous reports received by the FC from local sensors in any time slot. The numerical procedures developed in Section III analyze local sensor operation by computing probability distributions of numbers of arriving summary reports at the FC. This information is used for designing FC and local sensor thresholds. The proposed approach enables quantitative tradeoffs of network size and sensor communication bandwidth against delay performance among systems with the same global false alarm rate. Comparisons with existing systems show that the proposed system outperforms low communication one-shot detection and is outperformed by high communication centralized detection as expected. In the future, the proposed method could be extended to include the design of alternative quantization schemes employed by local sensors other than the binary one used here.