Bayesian Quickest Detection of Propagating Spatial Events

Rapid detection of spatial events that propagate across a sensor network is of wide interest in many modern applications. In particular, in communications, radar, IoT, environmental monitoring, and biosurveillance, we may observe propagating fields or particles. In this paper, we propose Bayesian sequential single and multiple change-point detection procedures for the rapid detection of such phenomena. Using a dynamic programming framework we derive the structure of the optimal single-event quickest detection procedure, which minimizes the average detection delay (ADD) subject to a false alarm probability upper bound. The multi-sensor system configuration is arbitrary and sensors may be mobile. In the rare event regime, the optimal procedure converges to a more practical threshold test on the posterior probability of the change point. A convenient recursive computation of this posterior probability is derived by using the propagation characteristics of the spatial event. The ADD of the posterior probability threshold test is analyzed in the asymptotic regime, and specific analysis is conducted in the setting of detecting random Gaussian signals affected by path loss. Then, we show how the proposed procedure is easy to extend for detecting multiple propagating spatial events in parallel in a multiple hypothesis testing setting. A method that provides strict false discovery rate (FDR) control is proposed. In the simulation section, it is demonstrated that exploiting the spatial properties of the event decreases the ADD compared to procedures that do not utilize this information, even under model mismatch.


I. INTRODUCTION
Sequential change-point detection, often referred to as quickest detection, is a fundamental statistical inference task [1]- [9]. It is encountered in numerous applications, such as Internet of Things, environmental monitoring, biosurveillance, finance, radar, and wireless communications. Sensor networks are commonly used to rapidly detect a disruption or an event in the monitored physical enviornment [10]- [15]. Usually, in these sensor networks the sensors communicate with a fusion center (FC) or a cloud that performs statistical inference tasks based on the data or local statistics from the sensors. The network setting can be centralized [10], [11] where the FC has access to all the data from the sensors or decentralized [12]- [17] where the sensors perform local computations/inferences and may only send the results, e.g. some sufficient statistic, to This work was partly supported by the Academy of Finland projects: (1) Statistical Signal Processing Theory and Computational Methods for Large Scale Data Analysis (2) WiFiUS project: Secure Inference in the Internet of Things.
T. Halme, E. Nitzan, and V. Koivunen are with the Department of Signal Processing and Acoustics, Aalto University, Espoo, Finland, e-mail: topi.halme@aalto.fi, eyal.nitzan@aalto.fi, and visa.koivunen@aalto.fi. the FC. Recently, quickest detection of multiple change points in parallel has gained wide interest [11], [18]- [21]. Parallel multiple change points can be caused, for example, by multiple active radio transmitters, multiple sound sources, multiple radar targets, or multiple emitters of polluting particles.
In many cases, the event causing the change in the environment has some spatial properties. The event can be a moving target that appears in a surveillance system, propagating radio frequency or audio signals impinging distributed sensors, pollution emanating from a malfunctioning device, or the onset of an epidemic. The effect of the spatial event on a specific area of the network can be modeled in different ways depending on the underlying physical phenomenon of interest and parameters of the sensor system observing it, such as the sensor system configuration and sampling rate. For example, the event may instantaneously affect the sampling distributions of all the sensors in the vicinity of the event [8], [12]. Alternatively, the disruption may propagate across the sensors in the network or in some cluster of the network over a short time period [10], [15], [22]. Examples are propagation of polluting particles in environmental monitoring, seismic activity in earthquake monitoring, propagation of radio waves through space in communication or radar systems, and epidemic traveling wave in biosurveillance.
Several works have considered quickest detection while incorporating spatial information. In [10], Bayesian quickest detection was considered where the sensors are numbered and located with regular geometry and uniform displacements. The initial origin of the disruption was known to be at the first sensor and the disruption propagates through all the sensors as a Markov process in an order determined by the numbering. Extension of [10] to the case where the first sensor experiencing the initial change is unknown was proposed in [23]. Given the sensor observing the disruption first, a predetermined change propagation trajectory was assumed across the sensors. The work in [15] considered a similar setting to [10], [23], with the difference that the change propagation pattern is assumed to be unknown. In addition, both centralized and decentralized settings were considered. In [22], Bayesian continuous-time single change-point detection with sensor networks was studied. The event was assumed to occur at a random time instant in a random location and gradually propagate through the sensor network with unknown velocity triggering interdependent change points. A numerical procedure was proposed to approximate the optimal but intractable Bayesian solution based on the approximated posterior probabilities. Non-Bayesian change-point detection with spatial information under different setups has been stud-ied in [24]- [33]. In [27], a setting where a moving anomaly affects one sensor at a time was considered. As the disruption moves over the network, the affected sensor changes with time according to either a known or an unknown probability model. Rapid detection of propagating phenomena has recently also been studied within the learning and adaptation framework [34]. In [34], a fully-flat network without a central unit is deployed for monitoring. Sensors must exchange information with their neighbors in order to accurately estimate the true state of nature in their vicinity. Due to the more complex network communication topology and interaction and information exchange among the neighboring sensor nodes, obtaining strong performance guarantees in terms of detection delay and false alarm rate, as is the goal under a quickest detection formulation, is difficult. Moreover, the delays in exchanging information among sensors may be significant compared to the propagation speed of the monitored phenomenon. Nonetheless, an information diffusion scheme that results in each sensor detecting changes faster than sensors working in isolation was derived.
In this paper, we propose a Bayesian method for the detection of a propagating spatial event. A discrete-time model is used in acquiring observations. It is assumed that the event propagates in a two-dimensional plane with area radius that increases either randomly or deterministically based on the laws of physics. This wavefront propagation model is illustrated in Fig. 1. It is relevant, for example, in wireless communications where radio waves that carry information are emitted from a transmitter, propagate through space with velocity equal to the speed of light, and detected by a receiver [35,Ch. 3]. The considered propagation model and change detection setup are also relevant in radar and seismic monitoring and localization applications where, for example, a point source or a target is generating or reflecting waveforms that propagate across a sensor array as a plane wave [36], [37]. Another application is in biosurveillance applications that attempt to detect outbreaks of epidemics [22], [38]- [40]. The sensors of the network take observations sequentially in discrete time slots. At each time slot, sensors that are located outside the disruption area obtain observations that follow a common null distribution (no signal present). Sensors that are located within the disruption area obtain observations that obey alternative distributions that may be different among the exposed sensors. We are interested in detecting the initial event as quickly as possible subject to statistical constraints on the rate of false alarms. We assume that the number of sensors and their locations are known at each time slot but may change in time, e.g. a mobile wireless sensor network [41] where the sensors might correspond to e.g. smart phones or drones. Since the main focus of this work is on detection of spatially localized events, it is assumed that all sensors are able to communicate individually with a common FC or cloud. The FC could be a base station (BS) serving different users and sensors in its coverage area. Modern wireless systems such as 5G have reasonably small coverage areas because of the higher frequencies they are using. Separate mobile access points that traverse the network and collect information from the sensors, as in the SENMA framework [42], [43], are not required. Obviously, there exists a wide area of applications where such access points are useful, but providing guarantees on the detection performance may be difficult in those settings. O A n (1) A n (2) Source Sensor  The related works described earlier do not explicitly and jointly take into account the sensor locations, the displacement between the sensor location and the location of the disruption source, and potential sensor mobility. In particular, to the best of our knowledge, the problem of quickest detection in discrete time in the practically relevant scenarios where the change is caused by a gradually expanding spatially localized event(s) has not been addressed earlier. The most related previous work ( [10], [15], [23], [27]- [29]) focuses on settings where the dynamics of the change-event are modeled as movement from sensor(s)-to-sensor(s). The particular assumed sensor network topology (e.g. an array [10] or a graph [27]) results jointly from the spatial properties of the event, and the placement of the sensors. The event is then assumed to affect the sensors according to a model specified by this topology. In contrast, in this paper we consider potentially mobile sensor systems with completely arbitrary displacements and no regular sensing geometry. Therefore it is not in general possible to describe the propagation of the event with any fixed network topology or model. Whether a particular sensor is affected depends on its location relative to the source of the physical event, and the sensor locations may be arbitrary and change over time. Hence, the previous works cannot directly be applied to quickest detection of spatially phenomena emanating from a distinct and potentially unknown location, especially when sensor locations can vary in time.
The contributions of this paper are: • We propose a dynamic programming framework for deriving a stopping time that exploits the spatial information and minimizes the average detection delay (ADD) under a constraint on the false alarm probability. An optimal detection procedure that minimizes the associated Bayes risk is derived. • As the optimal procedure is hard to implement in practice, we propose a more practical detection procedure based on thresholding the posterior probability of the change point having occurred. To theoretically justify the use of this simpler procedure, it is shown that the procedure is a limiting form of the optimal procedure in the rare event regime. A convenient recursive formula for computing the posterior probability is developed by using the radial change propagation pattern of the spatial event. • The probability of false alarm (PFA) control of the thresholding procedure is established and its ADD is analyzed in the asymptotic regime. Conditions under which the threshold procedure achieves asymptotic optimality are provided. Furthermore, we analyze the procedure in the specific setting of quickest detection of attenuating random signals. An approximation for asymptotic ADD is provided, and asymptotic optimality is established. • We extend the single-event detection procedure to the detection of multiple statistically independent spatial events in parallel by combining the developed posterior probability threshold procedure with a Multiple Hypothesis Testing (MHT) setup. The rate of false alarms is controlled using the False Discovery Rate (FDR) criterion, which is widely used for MHT [11], [44], [45]. This is highly relevant in many modern applications where high dimensional data must be processed in parallel and there may be multiple events taking place at the same time [11], [18], [21]. It is shown that the proposed parallel procedure strictly controls the FDR level. • Simulations are conducted to verify the theoretical findings. It is clearly demonstrated that exploiting the spatial properties of the event decreases the ADD compared to procedures that do not utilize this information in both single and multiple cluster settings. This benefit is achieved even under model mismatch. It is demonstrated that the gain in performance is the largest when the event propagates sufficiently slowly compared to the sampling rate, or when the sensor displacements are large, and/or when the pre-and post-change probability models are different enough. Preliminary results of this paper appear in conference papers [46] and [47]. This paper is organized as follows. In Section II, we formulate the quickest detection problem. A dynamic programming framework for the detection is formulated in Section III. Under the radial change propagation setup, a change-point detection procedure and its extension to multiple parallel change-point detection are presented in Sections IV and V, respectively. Our simulations and conclusions appear in Sections VI and VII, respectively.

Notation
Scalar random variables are denoted by normal font capital letters, with the exception of the change point t, which is also a random variable. Scalar constants, such as realizations of random variables, are denoted by normal font lowercase letters, with the exception of L, M , N , K and R, which are constant integers, and U n,m,r which denotes an event. Boldface uppercase and lowercase letters are used for vector random variables and constants, respectively. For an integer K, we use [K] to denote the set {0, 1, ..., K − 1} of cardinality K.

II. MODEL AND PROBLEM FORMULATION
We begin by describing the model for a single spatial change-point detection problem. The model, relevant terms, and notations for multiple change-point detection in parallel will be described in Section V. Let (Ω, F, P) denote a probability space, where Ω is the sample space, F is the σalgebra generated by Ω, and P is a probability measure. The expectation operator with respect to P is denoted by E.
At each time slot we have sensors in known locations but with arbitrary configuration within a domain of interest, S ⊂ R 2 . The set of sensor locations at time slot n is denoted by A n and the corresponding number of sensors is |A n |. If |A n | = 0, then there are no observations received by the FC at time slot n. Unless otherwise stated, in this paper the sensor locations are considered known and deterministic.
We consider a centralized setting where every sensor communicates its observations or local decision statistics to the FC. At time slot n, the data transmitted by the sensors and received by the FC are random variables X n . In mobile scenarios, the location information a is communicated to the FC in addition to the observation value. Alternatively, the FC can have a capability to reliably estimate the locations of the sensors. Uncertainty in the location estimate could be represented as a probability distribution, which could be averaged over in a Bayesian framework. However, for the purposes of this paper we assume for simplicity that reliable point-estimates of the sensor locations exist. We define the |A n | × 1 data vector X n that contains all the observations transmitted at time slot n, including the locations at which the observations were obtained by mobile sensors. It is assumed that at time slot n the FC has access to the current and past observations, I n = (X 1 , . . . , X n ), where I 0 is the empty set.
At a random time instant, t, a source becomes active and starts emitting a propagating signal/event from an unknown origin, O, causing a disruption in the domain of interest. It is assumed that the initial event time, t, has a geometric prior distribution with parameter, ρ ∈ (0, 1), i.e.
where N 0 = N {0} and N is the set of positive integers. The geometric prior distribution is very common in changepoint detection because it is a mathematically convenient memoryless distribution, which is also relevant in a variety of practical applications [7], [8], [10], [15]. We want to discover the initial event time, t, with minimal delay while controlling the PFA. In a multiple change-point setup, as will be described in Section V, the PFA is replaced by the FDR criterion in a MHT framework. Hereafter, we refer to t as the change point even though it may not cause an instantaneous change in the received observations as in classic change-point detection, due to the fact that the sensors are in distinct locations and displaced from the signal source.
In order to reduce the complexity of the problem, we initially assume that O ∈ O ⊂ S where O = {o 0 , . . . , o M −1 } is a finite set of possible source or emitter locations in S with cardinality |O| = M . We assume that the initial event can occur in any of the possible locations, o m ∈ O, with equal probability The assumption of a uniform probability distribution is made for simplicity and is not necessary for the following derivations. In practical settings it may not be realistic to expect that the source location can only appear within a known finite set of points. However, choosing the set O to be a sufficiently dense discretization of the 2D-plane can allow one to approximate the continuous field well and reduce the complexity, as is demonstrated in the simulation section. In this work the true location is initially assumed to lie within a known finite set in order to facilitate a dynamic programming solution. Additional signal processing may be applied to obtain point estimates of the location. That thoroughly studied source localization topic is outside the scope of this paper.
The chosen sampling rate and duration of the discrete time slot used in acquiring the observations can highly affect the sensitivity of the network to the spatial event, and the time resolution and delay of detecting the change. Generally, the time-domain sampling rate should be selected so that one can distinguish among differences in the disruption arrival times at different sensors. In the considered model, if the event propagates with a constant radial velocity, we model the sampling rate such that during each time slot the radius of the disruption area increases by a fixed unit, e.g. some fraction or multiple of the wavelength. In order to take a variety of random propagation effects into account, we allow some randomness in the propagation of the spatial event. For example, epidemic spread may have a high degree of stochasticity due to random movements and interactions among individuals [22]. Generally, propagation randomness can be due to randomness in the velocity [22, Eq. (6.8)], due to timing jitter [48], or due to reflections, non-homogenous medium, scattering, and multipath [35].
Let R n denote the area radius of the propagating event at time slot n. For simplicity, we assume that the area radius can have only discrete integer values corresponding to a fixed distance unit. Let R ∈ N denote the smallest disruption area radius that covers the entire domain of interest, S, regardless of the actual point of origin, O ∈ O. Thus, we assume that R n ∈ [R + 1]. It is assumed that R n = 0 when n < t, and R t = 1, i.e. only when the initial change occurs, the event area radius expands by one unit. In addition, we assume that At each time slot after the initial change the radius of the disruption increases by one radius unit with probability ρ 1 ∈ (0, 1] and stays the same as in the previous time slot with probability 1 − ρ 1 . As R is the maximum radius of the affected region, if R n = R then R m = R for m ≥ n. Using (1), we obtain In addition, we obtain P(R 0 = 0) = 1−ρ and P(R 0 = 1) = ρ. At each sensor location it is assumed that the sensor observes the disruption only if the disruption is present in this location, i.e. the distance between the sensor location and the source location is smaller than the current area radius of the disruption. Assume that the disruption is emanating from a source at O = o m . Then, if a sensor at location a ∈ S is not exposed to the disruption, it acquires a noise-only observation coming from a known null probability density function (pdf), f 0 . Otherwise, if this sensor is exposed to the disruption, it receives an observation with known pdf, f (a,om) 1 , that may depend on a ∈ S and o m ∈ O. For example, the power of the received signal can affect the parameters of the alternative pdf, and due to path loss may depend on the displacement between the sensor and the source [35], [49]. In many applications, the f 0 density represents random noise only, the statistical properties of which can be either known from theory, or estimated from training data even locally for each sensor in the absence of signal. On the other hand, the exact f 1 distribution, influenced by the appearing signal, may not always be known in practice. The issue of dealing with uncertainty in the f 0 and f 1 distributions has been an active topic of research in the field of quickest detection, see for example [50], [51] and references therein. Therefore, in this work we consider the probability models to be known, and refer to the existing literature for solutions on handling any model uncertainty. Moreover, it will be observed in Section IV that for the PFA control it suffices to know only the f 0 distribution. In many detection problems, controlling the false positives is crucial so that the system is not overwhelmed with detections and subsequent tasks. Conditional on the true system state and the sensor locations, the observations at each time slot are assumed to be independent across the sensors, as well as independent of all previous observations. Since the individual sensors are distributed and in distinct locations, the sensor noise present in any physical measurement can be considered independent.
At each time slot, the FC decides whether the initial event has taken place or not based on the information, I n , which is available at time slot n. To this end, it uses a stopping time, T , according to a predefined stopping rule. The delay in detection is quantified by the ADD, where x + = max{0, x}. The PFA is defined as III. DYNAMIC PROGRAMMING FOR OPTIMAL STOPPING TIME In a similar manner to classic Bayesian change-point detection [8], our goal is to derive the stopping time where ∆ α = {T : PFA(T ) ≤ α}. Put into words, we want to find a stopping time with the smallest ADD among stopping times for which the PFA is not larger than α, where α ∈ (0, 1) is a predefined tolerated level of false alarms. In this section, we take a dynamic programming approach for solving (6).

A. Finite horizon
We begin by restricting the stopping time to a finite horizon [0, N ]. Solving the constrained optimization problem in (6) can be approached by formulating a Lagrangian relaxation problem that minimizes the Bayes risk over all admissible stopping times. The state of the system at time n is denoted by S n ∈ {(m, r) : m ∈ [M ], r ∈ [R + 1]} ∪ Υ, with S n = (m, r) meaning that at time n the event originating from O = o m has radius R n = r. The term Υ represents the terminal state that the system goes into after a change is declared. In case R n = 0, the spatial event has not occurred yet. From the description in Sec. II it is clear that the system state S n evolves as a Markov process. Moreover, conditional on the system state and sensor locations, the observations are i.i.d. As such, the problem lends itself to a dynamic programming solution.
Since {T < t} ⇔ {R T = 0}, the Bayes risk in (7) can be expressed in additive form as [10], [15] In a finite horizon, we denote the minimum expected cost-togo from n to N by J N n (I n ), which is in general a function of all available information I n at time n. The cost-to-go function obeys the backwards recursion In (9) the first term inside the minimum corresponds to the expected cost of stopping at n, and the second term denotes the expected cost of continuing the monitoring process. We denote the posterior probability of the event {S n = (m, r)} given I n by p n,m,r . That is, and p n = [p n,0,0 , ..., p n,1,R , p n,2,0 , ..., p n,2,R , ..., p n,M −1,R ] (12) is a M · (R + 1) dimensional vector that collects all of the probabilities of time n. In the next subsection, we present a recursive update formula for p n that will be used in the dynamic programming solution.

B. Posterior probabilities computation
At any time slot, n, the sample space of the considered setup, Ω, can be partitioned as where U n,m,r = {O = o m , R n = r} = {S n = (m, r)} (14) are pairwise disjoint events and the events {O = o m } and {R n = r} are independent. In the following, we derive a convenient recursive formula for computing p n,m,r = P(U n,m,r |I n ). Repeated use of the Bayes rule allows us to write p n,m,r as p n,m,r = f (x n |U n,m,r )P(U n,m,r |I n−1 ) M l=1 R r=0 f (x n |U n,l,r )P(U n,l,r |I n−1 ) .
In addition, according to the assumed propagation model, R n−1 can only be equal to R n or less than R n by one. Therefore, by the law of total probability, Bayes rule, and (2)-(3), we can write P(U n,m,r |I n−1 ) = P(U n,m,r |U n−1,m,r−1 )p n−1,m,r−1 + P(U n,m,r |U n−1,m,r )p n−1,m,r .
The conditional probabilities that the radius increases by one radius unit during one time slot for different radius values are In particular, it is seen that P(U n,m,r |U n−1,m,r−j ), j = 0, 1, is independent of n. It should be noted that the radius of the area of the spatial event can reach a radius r no earlier than time slot n = r − 1.
At time n, the probabilities p n can be updated using only the probabilities at the previous time step p n−1 , current observation vector x n , and prior information. Thus, even as data accumulates with time, the amount of computations required for computing p n remains constant per time slot. In particular, at time slot n the amount of computations required for computing p n,m,r is O(M R).
It is observed that p n depends on I n−1 only through p n−1 and by (10) we have J N N (I N ) = J N N (p N ). Then, a simple induction argument shows that p n is a sufficient statistic for the program, i.e. the minimum expected cost-to-go from n to N can be expressed as a function of p n , and thus J N n (I n ) = J N n (p n ). We denote the posterior probability of the event having radius r at time n by 6 The Bellman equations from (9) and (10) can then be expressed as J N n (p n ) = min(π n,0 , c(1 − π n,0 ) + D N n (p n )), (19) and where can be expressed as a function of p n similarly to [10], [15].

C. Extension to infinite horizon
In this subsection, we remove the upper bound on T , and consider the case N → ∞. We write J n = lim N →∞ J N n for the cost-to-go function in the limit and similarly D n = lim N →∞ D N n . The limits are well defined, since 0 ≤ J N n (p n ) ≤ 1 and J N n (p n ) ≥ J N +1 n (p n ) for all p n , n and N . Therefore, we obtain J n (p n ) = min(π n,0 , c(1 − π n,0 ) + D n (p n )) n ∈ N 0 , (22) where D n and J n are non-negative functions on the M × (R + 1)-dimensional simplex. It is then seen that the optimal stopping time T opt is of the form where the change is declared the first time the posterior probability of the event not being present drops below c(1 − π n,0 )+D n (p n ). In general, the structure of D n is not explicitly known, hence no closed-form optimal solution exists. Furthermore a numerical approximation of the optimal stopping time is computationally challenging and may be hard to analyze.
An interesting special case is the regime where the initial disruption is a rare event, i.e. ρ → 0. The following result establishes that in this scenario the optimal test T opt converges in probability to a simple threshold test on π n,0 which provides an attractive solution for practical use. Theorem 1. The optimal stopping rule in (23) converges in probability to a threshold test for a properly chosen Q as ρ → 0.
Proof: See Appendix A. In the following section, we propose a procedure denoted as the radial propagation (RP) procedure for single change-point detection, which is based on the threshold test from (24).

RADIAL PROPAGATION
In the previous section, it was observed that in the limit ρ → 0, the optimal Bayesian stopping rule converges to a simple threshold rule T Q , defined in (24). In this subsection, we study the performance of this stopping rule for any ρ.
The following proposition provides an upper bound for the false alarm of probability of T Q . Proposition 2. The false alarm probability of T Q from (24) can be upper bounded with PFA(T Q ) ≤ Q.
Proof: By combining (5) and (8), one obtains where the second equality is obtained using the law of iterated expectations, the third equality is obtained from the definition of π n,0 in (18) and the inequality from the definition T Q .
Remark. It should be noted that the PFA upper bound of Proposition 2 is valid even if many of the model assumptions are violated. As T is a stopping time, {T < t} ∈ I t−1 . As all observations in I t−1 are generated from the pre-change model, it is clear from the definition of the probability of false alarm in (5) that the PFA depends only on the pre-change observations. Violations of the assumed post-change behavior, such as a misspecified f 1 or departures from the assumed propagation model do not impact the PFA. This is a useful property, since the post-change distributions (usually generated by signal + noise) are often more difficult to characterize than the prechange (noise only), as training data may be available from the pre-change probability model only.
From here on, we refer to the stopping time T Q as the radial propagation (RP) procedure, where the stopping threshold is chosen to equal the false alarm upper bound α,

A. Asymptotic optimality
In full generality, the ADD of the RP procedure is tedious to analyse due to the unknown source origin point, the potential mobility of the sensors and their arbitrary locations at each time slot. In order to shed some light on the ADD of the RP procedure, we provide sufficient conditions under which the RP procedure is asymptotically optimal in the vanishing PFA regime α → 0.
In the asymptotic analysis we consider the case where the disruption propagates in a deterministic fashion with constant velocity, i.e. ρ 1 = 1, so that its area radius increases by one unit in each time slot up to the maximum radius, R. The observations X n are conditionally independent with prechange pdf f 0 and post-change pdf f In order to analyze the asymptotic detection delay, some conditions on the long-term behaviour of the log-likelihood ratio process Z k,m n are required. It is assumed that there exists some q m such that for every k and m, on Note that if the post-change distribution is independent of location, i.e. f Proof: From the definition of ADD we have that where ADD m (T ) = E [(T − t) + |O = o m ] is the detection delay when the true source location is o m . Conditional on the source location o m being known, the problem reduces to a standard Bayesian quickest detection formulation with a noni.i.d. post-change distribution given by (26). A lower bound for the asymptotic detection delay of any procedure in the class ∆ α in this setting was derived in [8]. Specifically, by [8, Thm The Lemma follows from combining (30) and (31). It should be noted that the asymptotic lower bound from (29) is identical to the lower bound for the case of instantaneous change, where all the sensors are affected at the same time [12].
The almost sure converge of Z k,m n , as required in (28), is not sufficient for proving the asymptotic optimality of the threshold rule T RP . Therefore, in the following theorem we impose some mild additional assumptions on the rate of convergence of Z k,m n to q m and show that T RP is asymptotically optimal and attains the lower bound from (29). To this end, we define for > 0 the random variable, which is the largest value of n for which the absolute difference between 1 n Z k,m k+n−1 and q m is larger than . It is required that (32) Similarly to [8,Eq. (3.22)], the condition in (32) is a joint condition on the convergence rates of 1 n Z k,m k+n for each t = k and the prior distribution of the change point t. In particular, it is analogous to complete convergence [52] of 1 n Z t,m t+n to q m under the distribution of t.
Theorem 4. Suppose the conditions of Theorem 3 are satisfied and assume that (32) is satisfied. Then T RP is first-order asymptotically optimal in the limit α → 0, i.e.

(34)
Proof: The proof is given in Appendix B.

B. Detection of attenuating signals
In this subsection, we show how the obtained asymptotic results can be used to accurately approximate the expected detection delay in practically relevant settings. We consider the case of detecting an attenuating random Gaussian signal in additive noise. This model is highly relevant in a variety of practical applications in e.g. wireless communications and radar [53]. Prior to the change, only zero-mean i.i.d. Gaussian noise with variance σ 2 is observed. At an unknown time t, a signal source becomes active somewhere in the field. If the signal does not have any known structure, it can be modelled as zero-mean Gaussian with variance γ 2 , where γ 2 is the signal transmit power. In free space, radio wave power decreases as the inverse square of distance d between the source and the receiver [35]. In most practical wireless settings, the path loss exponent, denoted here by θ, is usually greater than 2 due to obstacles, reflectors and scatterers. Therefore, for a sensor at distance d away from the source excluding antenna and frequency dependent factors, the observed signal is of the form N (0, γ 2 /d θ ), whered = max(d, 1). Denoting f = N (0, σ 2 + γ 2 /d θ ). Suppose for analysis purposes that the signal source location is known, that the domain of interest is a disk with large radius R centered at the signal source, and that at each time step there are L sensors located independently and uniformly at random within the disk. The following result establishes that T RP is asymptotically optimal in this setting, and provides a first order approximation of the asymptotic detection delay.
Proposition 5. Under the conditions described in Section IV-B, T RP is first-order asymptotically optimal. Moreover, in the free-space conditions of path-loss exponent θ = 2 where and φ = γ 2 /σ 2 is the SNR in linear scale.
Proof: The proof is provided in Appendix C. Observe, that the expression for Lq φ in (35) further simplifies in the limit R 2 , L → ∞. When R 2 , L → ∞ such that L/R 2 → λ, we have where λ represents the average number of sensors per unit area.

V. EXTENSION TO MULTIPLE CHANGE-POINT DETECTION
In this subsection, we briefly describe how the simple structure of the RP stopping rule allows its use in settings when one is monitoring multiple separate fields and signal sources at once. This is a very relevant case in practice since in IoT, wireless networks or radar systems there may be multiple active signal sources, abrupt events or targets simultaneously and there is a need to strictly control the false positives in decision making while detecting changes rapidly. Suppose that there are K ≥ 2 distinct clusters of sensors. For each cluster, k ∈ [K], there may exist a random initial event (change point), at time t (k) , that propagates and affects the sensors in the cluster according to the model in Section II. We allow the probability of no event in a cluster to be non-zero, where no event implies an infinite change point. The spatial events and sensor observations of the different clusters are assumed to be independent. This assumption, while restrictive in general, is reasonable in cases where the sensor clusters exist in spatially dispersed locations, and the events are spatially localized. The assumed setup is illustrated in Fig. 2. We would like to derive multiple stopping rules, T (k) , k ∈ [K], in order to discover all the change points, t (k) , k ∈ [K], respectively, while strictly controlling Type I errors. We employ a sequential multiple hypothesis testing framework for this purpose.
A practical assumption for any sequential detection procedure is that it must be stopped at some finite time instance. Thus, we allow the existence of a deadline N max for the multiple change-point detection. If a change point in the kth cluster has not been declared before time slot N max , we declare that there is no spatial event in the kth cluster and set T (k) = ∞. For detecting multiple change-points in parallel, the False Discovery Rate (FDR) is a relevant false alarm rate criterion [11], [21]. This criterion is defined as .
The term V is the number of false discoveries (false alarms) under deadline, i.e. the size of the subset of [K] s.t. T (k) < t (k) and T (k) < N max . The term R denotes the number of discoveries under deadline, i.e. the size of the subset of [K] s.t. T (k) < N max . We would like to control the FDR s.t. it will be no higher than a predefined tolerated level α ∈ (0, 1). Taking into account the possibility of infinite change points, we denote by K f the random number of finite change points and define the overall ADD as where for K f = 0 the argument of the expectation in (38) is zero. In case the no change-point probabilities are zero we can rewrite the ADD as For the considered multiple statistically independent clusters, we will implement the following K parallel stopping rules: where π (k) n,0 is the posterior probability of a cluster change point having occurred in cluster k. A cluster change point is the time of an initial event in the cluster. The threshold choice in (40) guarantees FDR control under upper bound α, in accordance with the parallel version of the IS-MAP procedure in [21].
In each cluster it is assumed that there is no change point with probability p ∞ and with probability 1 − p ∞ the prior distribution of the initial change point, t (k) , is geometrically distributed with parameter ρ. Under the above assumptions, the change-point posterior probability update is similar to the one described in Subsection III-B and implemented for each cluster separately. However, some expressions for probabilities need to be rederived. For simplicity of presentation, we omit the cluster index, k ∈ [K], in the following expressions. For a specific cluster, by using (1) and the no change-point probability, p ∞ , we obtain P(R n = 1|R n−1 = 0) = P (t = n|t ≥ n) n ∈ N, where we recall that P (R n = 1|R n−1 = 0) = 1 − P (R n = 0|R n−1 = 0). In addition to (41), the following expressions are rewritten to take into account the no changepoint probability: and . (2) A n (1,2) A n (1,1) A n (2,1) A n (2,2) Source Sensor

VI. NUMERICAL SIMULATIONS
In this section, we evaluate the performance of the RP procedure in terms of PFA and FDR control and ADD performance under different radial propagation models and different multi-sensor configurations. In order to better understand the behavior of the RP procedure, we compare its performance to other procedures that either know the unobservable true source location, or deploy a more simplistic propagation model. Robustness to misspecification is tested by implementing a misspecified RP procedure that incorrectly assumes the event to propagate much faster than it does.

A. Simple Gaussian observation model
We begin by considering a single cluster and a simple Gaussian observation model where f 0 = N (0, 1) and f 1 = N (0, 1+γ 2 ), no matter the sensor location and the event origin point. In all experiments L = 100 sensors are randomly placed on the field at each time instance. The true source location is selected randomly from a uniform distribution over the field. Observe, that this is in contrast to the design-stage assumption that the true source locations lies in the finite set O. The RP procedure is compared against two other procedures. The first one is an Oracle version of the RP procedure that knows the exact source location. The Oracle procedure is a special case of the RP procedure with |O| = 1. The other procedure implemented for comparison purposes assumes that the event, once it appears, affects all sensors instantly [12]. We refer to this procedure as the Instant procedure. The Instant procedure is also a particular special case of the RP procedure, where one assumes R n = R for n ≥ t and R n = 0 for n < t. It is to be expected that this procedure will provide inferior performance to the RP procedure, as it does not take the dynamic nature of the propagation into account. However, the comparison will provide insight into to the behaviour of the RP procedure by highlighting the scenarios in which the performance gap between the properly specified and misspecified procedures is significant, and where the difference in performance is smaller.
We start by setting ρ = 0.02, ρ 1 = 0.25, γ 2 = 1 and considering a square spatial field S = [0, 10] × [0, 10] where the sensors and sources are located. The set O used by the RP procedure is taken to be an equally spaced grid of M points which covers the field of interest. In addition to the properly specified RP procedure, we implement a mismatched RP stopping rule (with M = 50), which correctly assumes that radius increases with probability ρ 1 but with increments of 5 times the true radius increment (1 unit). It corresponds to a setting where the real event propagates slower than assumed by the RP procedure.
In Table I, observed false alarm probabilities of all procedures for different stopping thresholds α are displayed. It is confirmed that the theoretical PFA upper bound derived in Proposition 2 holds in all cases. In the top plot of Figure 3, the PFA-ADD trade-off curves are plotted for the procedures, with the RP procedure implemented using source location grids of density M = 10, 50, and 100. For this small field, the performance of the RP procedure is comparable to the Oracle procedure. Furthermore, it is observed that increasing the density of the location grid in the RP procedure improves performance. However, under this configuration for M = 50 and M = 100 the gap in performance is already indistinguishable. All versions of RP procedure, including the misspecified one, outperform the Instant procedure. In the middle plot of Figure 3, the procedures are compared for varying values of the propagation parameter ρ 1 , with α = 0.01 fixed. For small values of ρ 1 , the Instant procedure experiences performance loss in comparison to the others. This is because when ρ 1 is small the event will expand slowly with respect to the discrete-time sampling rate and thus remain spatially localized for a longer time, making it harder to detect for the Instant procedure. In general there is an inverse relationship between ADD and ρ 1 for all procedures, as a larger ρ 1 implies that the event will be visible to more sensors quicker. The RP and Oracle procedures provide near identical performance for all ρ 1 . The mismatched RP procedure achieves lower ADD than the Instant procedure for all ρ 1 values. In the bottom plot of Fig 3, we fix ρ 1 = 0.25, α = 0.1 and vary the signal power parameter γ 2 . For unit noise variance, we have SNR (dB) = 10 log 10 (γ 2 ). It is observed that at low SNR regime the difference between the RP and Instant procedures is smaller, but for moderate and and high SNRs a clear gap in performance in favor of the RP procedure again emerges. Moreover, the difference in performance between the RP and Oracle procedures is small, and the size of the perfomance gap is relatively independent of SNR.

B. Detection of attenuating radio signals
In this subsection, we implement the attenuating signal model introduced in Subsection IV-B. To demonstrate the extension of the RP procedure to the detection of multiple events in parallel, we consider a setting with K = 20 distinct, independent sensor clusters each with 100 sensors as described in Sec. V. Prior to the change in a given cluster, all sensors observe noise only, so that f 0 = N (0, 1). When a signal source appears in the kth cluster at time t (k) , it starts emitting an i.i.d. random signal modeled as N (0, γ 2 ). Due to path loss, the received signal strength attenuates according to a path-loss exponent θ of the distance d from the source. The signal and noise are considered additive, hence for a sensor at distance d from the signal source we have f with d 0 being a reference distance where the received signal power equals γ 2 . In the case of radio waves, the signal propagates at the speed of light c. The sensors take discrete time samples with some common sampling rate f s . Therefore, the signal area radius expands in a deterministic manner (i.e. ρ 1 = 1) by c/f s meters in a single time step. We take each cluster area S k , k ∈ [K] to be a square field with side length 5 km. The time at which the signal appears, t (k) , is considered to have an exponential prior distribution with a mean (in seconds) of β = 10 in all sensor clusters. A routine computation utilizing the properties of the exponential and geometric distributions then shows that the sample index at which the emitted signal first appears obeys a geometric distribution with parameter ρ = 1 − exp(−1/(βf s )). The RP procedure is again compared against an Oracle procedure that knows the true signal source location in each cluster, and the exact propagation dynamics. Additionally, two versions of the Instant procedure are implemented. The first one (called Instant-Oracle) knows the true and unobservable source location in each cluster, but assumes that the event reaches all sensors in the cluster immediately. The other one (Instant) assumes similarly to the RP procedure that the source location in each cluster belongs to a finite set O, and that the change is immediate everywhere in the cluster. Note that in the setting of Subsection VI-A knowledge of the true source location is not utilized in the Instant procedure since the appearance of the event was assumed to immediately change all sampling distributions from f 0 to f 1 no matter the source location. However, in this setting, the post-change sampling distribution f (d) 1 depends on the distance of the sensor from the source. Therefore, knowing the true source location has value even if the propagation is assumed immediate. Consequently, we obtain an interesting comparison between the RP and the Instant-Oracle procedures, as the RP procedure is aware of the propagation dynamics, but the Instant-Oracle has knowledge of the true source location.
In all clusters, we set the signal power γ 2 = 2 at a reference distance of 500m from the source, α = 0.01 and the path loss exponent θ = 2. The true source location of each cluster is sampled uniformly at random from S k . The sensor locations are also random and uniform, and assumed to remain stationary during the monitoring process. In Figure 4, the procedures are compared for different values of the sampling rate f s . The detection delay decreases for all procedures as the sampling rate increases, and the difference in ADD (in microseconds) between the RP and Oracle procedures shrinks as the sampling rate increases. It is observed, that for sufficiently high sampling rates the RP procedure achieves smaller detection delay than the Instant-Oracle procedure. When the sampling rate is high, accounting for the propagation dynamics is more valuable than theoretical knowledge of the source location, and vice versa when the propagation is rapid in comparison to the sampling rate. In Table II, the observed FDR values are displayed for different choices of the stopping threshold α when f s = 1 MHz. It is demonstrated that the RP procedure controls the FDR below the prespecified level α.

VII. CONCLUSION
In this paper, we proposed a method for Bayesian quickest detection of spatial events with radial propagation patterns using a mobile sensor network. First, we considered a single spatial event. A dynamic programming framework was used to derive the structure of the optimal stopping time in terms of ADD under upper bound constraint on the PFA. The optimal procedure has a complicated structure and implementing an  approximation is computationally challenging and infeasible to analyze. Therefore, utilizing a limiting form of the optimal procedure we proposed the simpler RP procedure that employs a stopping threshold on the posterior probability of the change point of interest. It was shown both analytically and experimentally that the RP procedure controls the PFA under a prespecified upper bound, even if the post-change probability models are misspecified. In addition, we showed that under some conditions the proposed RP procedure coincides with an asymptotically optimal procedure in terms of ADD as the PFA upper bound α → 0. Then, we proposed an extension to parallel detection of multiple spatial events occurring in distinct clusters. The proposed method stems from a multiple hypothesis testing problem formulation and strictly controls FDR criterion while taking into account the spatial nature of the observed phenomena or fields. A posterior probability update expression for multiple change-point detection which takes into account a probability that no event appears was derived.
In the simulations it was observed that for phenomena that propagate slowly with regard to the sampling rate, the RP procedure vastly outperforms a procedure that assumes that the effect takes place instantly everywhere in the field.
Similarly, in the high SNR regime the RP procedure provided significantly better performance than the Instant procedure. When the event propagates very quickly in relative to the sampling rate, or alternatively the SNR is very low, the performance gap was smaller, although still in favor of the RP procedure.
Topics for future research include the derivation of spatial procedures for multiple change-point detection and localization, where the locations of the signal sources are estimated using the observations. Additionally, extending the RP procedure to a non-Bayesian framework and studying its possible optimality properties is an interesting direction of future work.

APPENDIX A PROOF OF THEOREM 1
Stemming from [10, Th. 2], our proof proceeds by showing that the optimal stopping time T opt can be written as T opt = inf n ∈ N 0 : π n,0 < c + Ψ n c + ρ , where Ψ n is a function such that Ψn ρ → 0 as ρ → 0. The desired threshold test structure from (24) is then obtained in the limit ρ → 0. Let us define Ψ n = D n (p n ) − (1 − ρ)π n,0 .
Substituting this definition into (23) and rearranging gives (42). Then, the convergence of Ψn ρ → 0, as ρ → 0, can be shown by introducing the transformation q n,r = π n,r ρπ n,0 ⇐⇒ π n,r = q n,r R r=0 q n,r . This expression allows for using the steps in [10, Th. 2] to complete the proof.

APPENDIX B PROOF OF THEOREM 4
The proof is in two parts. First we define a set of M stopping times T (0) , ..., T (M −1) , such that for some threshold ν, where W (m) n = P(t ≤ n|I n , O = o m ). In Lemma 6 below, we show that a stopping time defined as the minimum of these M stopping times with thresholds ν = 1 − α/M achieves the asymptotic ADD lower bound. Then, it is shown that ADD(T RP ) ≤ ADD(T * ), and the Theorem follows. i.e. T * achieves the asymptotic ADD lower bound in (29).
Proof: Observe first from the definition of ADD that where ADD m (T * ) = E [(T * − t) + |O = o m ]. When o m is the