Optimal Event-Based Policy for Remote Parameter Estimation in Wireless Sensing Architectures Under Resource Constraints

Energy is a resource bottleneck in wireless sensing networks (WSNs) relying on energy harvesting for their operations. This is pronounced in WSNs whose data is used for remote parameter estimation because only a subset of the measured information can be transmitted to the estimator. While much attention has been separately paid to communication schemes for energy-aware data transmission in WSNs under resource constraints and controlled parameter estimation, there has yet to emerge a censoring policy that minimizes the variance of a measured process’ estimated component parameters subject to realistic constraints imposed by the WSN. Consequently, this paper presents the derivation of an optimal event-based policy governing data collection and transmission that accounts for energy and data buffer sizes, stochastic models of harvested energy and event arrivals, value of information of measured data, and temporal death. The policy is optimal in the sense that it maximizes the information rate of transmitted data, thereby producing the best possible estimates of the process parameters using the modified maximum likelihood estimation given the system constraints. Experimental and simulation-based results reflect these objectives and illustrate that the framework is robust against significant uncertainty in the initial parameter estimates.


I. INTRODUCTION
T HE proliferation of low-cost and miniature, yet highperforming, sensors has enabled sensing in the natural and built environments across a broad range of applications, including infrastructure, transportation systems, health care (e.g., wearable devices), surveillance, industrial control, and environmental systems, to name just a few. Sensing platforms within these domains have been aided by advancements in wireless communication to yield wireless sensor networks (WSNs) that enjoy low costs, low communication latency, and access to cloud computing services for more sophisticated data analysis. Despite these advances, the availability of energy remains a bottleneck in WSNs that do not have access to persistent power sources. WSNs often rely on energy harvesting (EH) from the environment, such as solar and vibration energy, which constrains the WSN's operation [1]. Consideration of harvested energy as a limited, and uncertain, resource is especially challenging in WSN architectures that are used to collect and transmit measured data to a remote estimator [2]. Since only a subset of the measured information can be wirelessly transmitted to the estimator for processing, there is an inherent tradeoff between the quality of the measured process' parameter estimates and the energy required to carry out data transmission. Such sensing architectures should ideally transmit high-value data that increases the quality of the parameter estimates and reject low-value data providing less information about the estimation.
Solving this problem requires joint consideration of 1) energy-aware data transmission in WSNs under resource constraints and 2) controlled parameter estimation over WSNs. Both domains have independently gained considerable traction in the wireless communication and signal processing research communities, respectively, and each has addressed separate challenges involving resource-constrained data collection and transmission. Communication schemes for WSNs with wide-ranging energy recharging models and objectives have emerged in recent years to augment advances in the efficiency of energy harvesting hardware circuits and design. While strategies for optimizing transmission objectives, such as maximizing throughput or minimizing transmission delay, have been widely studied [3] [4] [5] [6] [7], of note is the growing interest in communication schemes that consider the availability of energy and differentiate the importance of measured data. These selective communication strategies (also known as "censoring" or "event-based" strategies) explicitly account for the reward (or value) gained by transmitting data; the primary objective is to derive a threshold or transmission criterion that optimizes the tradeoff between available resources (e.g., communication energy) and the expected reward rate [9] [10] [11] [12] [13]. Within these strategies, censoring refers to a condition in which the observations of a measured process are only partially known. For example, for leftcensored processes, if the value of an observation is greater than or equal to the optimal threshold, then it is transmitted. If the observation is below the optimal threshold, then it is discarded, meaning it is known that an event occurred but the value of the event is unknown. Sensor censoring was first proposed by Rago et al. [14] to reduce communication rates in sensor signaling. the operation of energy-aware WSNs, the proposed policies assume that each collected data packet is transmitted immediately upon collection (i.e., a data storage buffer is not considered) and cannot be naturally extended to account for the utilization of a data buffer. Since data transmission is typically the most significant source of energy consumption in a wireless sensing architecture [15], accounting for the storage of (potentially) large amounts of data in a buffer that are communicated to the remote estimator in batch transmissions would lead to significant gains in the expected reward rate. Additionally, these policies cannot be naturally extended to jointly consider, with a remote estimator, the quality of parameter estimates that are generated remotely from collected data. In other words, following these policies, parameter estimates cannot be reconstructed from the subset of transmitted data alone. This is in part due to the dependency between the optimal thresholds and state of charge (SoC) proposed in these strategies. Since the missingness of censored (i.e., nontransmitted) data below the optimal threshold is classified as not missing at random (NMAR) [16] and the remote estimator does not have knowledge of the SoC, the parameter estimation of the censored measured process requires transmission of the 1) values of accepted observations, 2) censoring thresholds associated with each accepted observation, and 3) censoring thresholds associated with each censored observation that is rejected. This necessitates transmission for every observation (both accepted and censored), thus reducing these existing policies to suboptimal transmit-all policies when used for remote parameter estimation. Conversely, while remote parameter estimation in WSNs based on censoring policies has been studied widely as an energy-saving approach [14] [17] [18] [19] [20], objective constraints (e.g., transmission rate) are assumed a priori without consideration of stochastic energy recharging models, major hardware constraints (e.g., storage buffer, temporal death), and the stochastic nature of event arrivals.
The work presented herein reconciles these two problems and presents the derivation of an optimal threshold-based data collection and transmission policy for remote parameter estimation in WSN architectures that minimizes the variance of the measured process' parameter component estimates under resource and hardware constraints. Here, a reward process formulated as a discrete-time Markov chain is developed that models system constraints imposed by the WSN's energy buffer (e.g., battery) size, data buffer (e.g., static randomaccess memory (SRAM)) size, stochastic models of energy and event arrivals, the value of information of measured data, and temporal death. Notably, consideration of a data buffer, and corresponding batch transmission, within the system model leads to significant gains in reducing the variance of the parameter component estimates as compared to existing related literature, which assumes data is transmitted upon collection. The proposed model accounts for temporal death, in which all incoming data are discarded regardless of their values because the energy buffer of a WSN node is completely depleted. Within the Markov model, the reward gained by transmitting data is equal to the Fisher information [21] the observation carries about the unknown parameter, and state transitions are governed by a unique censoring value that is known by the remote estimator. The reward gained by rejecting data is equal to the Fisher information carried by a censored observation. Given the proposed model, this paper derives the unique censoring threshold value that is optimal in the sense that it maximizes the Markov chain recurrent class' average reward rate. This paper proves that the measured process' parameter component estimates can be reconstructed from the subset of data transmitted according to the optimal policy. Under regulatory conditions, since the average reward rate is defined to be the average rate of information gained about the unknown process, it is shown that the optimal threshold-based policy guarantees the transmission of a subset of observations that minimizes the variance of the measured process' parameter component estimates given the system constraints. Here, parameter estimation is carried out by the remote estimator using a modified likelihood function that is developed and used to estimate the censored process' component parameter estimates at the remote estimator. It is proven that the modified maximum likelihood estimator (MLE) developed, which accounts for the missingness of data not transmitted, is the maximizer, a Cramer-Rao bound (CRB) on the covariance matrix of the estimator exists, and the MLE is consistent, asymptotically unbiased, and asymptotically normal.
The remainder of this paper is delineated as follows. First, the problem formulation and system model are presented in Section II. The system model accounts for constraints imposed by the sensing architecture's energy buffer size, data buffer size, stochastic models of energy and event arrivals, the value of information of measured data, and temporal death. The derivation of the optimal threshold follows in Section III. Since only data exceeding the event threshold value are transmitted (resulting in left-censored batch transmission of data), Section IV proves that a modified likelihood function can be used to estimate the process component parameters from the subset of data transmitted according to the optimal policy. The optimal threshold-based policy maximizes the Fisher information rate of transmitted data, thereby minimizing the variance of estimated process parameters given the system constraints. Finally, experimental and simulation-based results are presented in Section V, where the proposed theoretical framework is implemented to control data collection and transmission in an EH WSN architecture subject to stringent energy constraints imposed by the availability of incoming energy and battery size, as well as varying data buffer sizes.

II. PROBLEM FORMULATION
The proposed framework guiding the energy-aware stochastic scheduling policy for remote parameter estimation discussed herein comprises a WSN node with an embedded data collection and transmission subsystem and a remote parameter estimator (Fig. 1). Within the transmission subsystem, as long as the energy buffer is not fully depleted (i.e., SoC E(t) > 0), the sensing architecture measures a process f (y; θ 0 ) with true but unknown process parameters θ 0 where, in general, f (y; θ) denotes the probability density function (PDF) of a random variable Y parameterized by θ ∈ Θ in the parameter space Θ ⊂ R R R p . If the energy storage buffer is fully depleted (i.e., E(t) = 0), then the arriving data cannot be measured and is automatically discarded. The value assigned to each realization of the measured process, v ∈ {F c , F y }, is equal to the Fisher information the observation carries about the unknown parameter and follows a two-point distribution where (1) In (1), the cumulative distribution function (CDF) and complementary CDF are denoted F (·) andF (·), respectively, and τ * s (with * denoting optimal) is the optimal threshold value that maximizes the average information rate of the collected data such that candidate data y is stored and transmitted if and only if y ≥ τ * s . For the case of a single-parameter distribution, F y is a scalar denoting the Fisher information that a single stored observation carries about the unknown parameter, and F c is the Fisher information carried by a single censored observation. Censoring refers to a condition in which the values of data are only partially known. For example, if y < τ * s then the data is left censored and consequently rejected; it is known that an event occurred, but the value of the event is unknown because it was never transmitted to the remote estimator. As shown in Fig. 1, the packet header will, for example, include the number of censored observations followed by the values of all collected observations. Because censoring is a statistic of the original data, a transmitted measurement contains greater information about the estimated process than a censored measurement [21] [22] [23]. For the case of a multiple-parameter distribution characterized by p > 1 parameters that satisfies the regulatory conditions (which will be subsequently detailed in Section IV), F y = F y (θ) pp and F c = F c (θ) pp for any p, where F y (θ) and F c (θ) denote the Fisher information matrices. As long as the SoC is not depleted (i.e., E(t) > 0), candidate data y is censored if v < τ * s and stored in the data buffer if v ≥ τ * s , regardless of the energy buffer's SoC. A data buffer is considered that can store k measurements.
Observations with values greater than or equal to the optimal event threshold are stored in the data buffer. Once the data buffer is full with k observations, the stored data, denoted by the vector Λ s = [Λ 1,s , Λ 2,s , ..., Λ k,s ] T , is batch transmitted to the remote estimator. Here, s = 1, 2, ..., s t is an index indicating the batch transmission. The index s = 1 denotes the first batch transmission occurring after monitoring initiation and s = s t denotes the latest transmission. Batch transmission is used because the overhead energy spent to transmit a packet is relatively high regardless of the payload size [24]. A perfect communication channel is considered; the case of imperfect transmission and feedback acknowledgements can also be considered for applications with significant timevarying wireless fading channel gains. Following Fig. 1, upon receiving a new batch of data, Λ s , the maximum likelihood estimate,θ s+1 , is updated by the remote parameter estimator, which then transmits the updated optimal threshold back to the sensing node. An a priori estimate is defined by the user at monitoring initiation when s = 1, whereas subsequent estimates,θ s , for s = 2, 3, ..., s t , are the maximum likelihood estimates calculated from the prior periods.
Given the event-based data collection and transmission framework, the goal is to derive the unique optimal event value, τ * s , that maximizes the average reward rate (i.e., average information rate) of the collected data given a WSN architecture's energy and data buffer sizes, stochastic models of energy and event arrivals, the value of information, and consideration of temporal death. The system model, which satisfies these constraints, is developed subsequently in Section II.A, and the optimal policy is derived from the system model in Section III. Section IV proves that estimates of the measured process' parameter components can be reconstructed from the subset of transmitted data using a modified likelihood function, where the modified MLE retains desirable properties that are characteristic of the full-information MLE assuming unlimited energy (i.e., all data are transmitted). This policy minimizes the variance of the measured process' parameter component estimates under the resource and hardware constraints.

A. Transmission Subsystem Model
Consider a sensing architecture with a finite-size replenishable energy buffer and a finite-size data storage buffer that can store k observations. While the system has continuous state (i.e., energy) and parameter (i.e., time) spaces, the state space is approximately modeled as a discrete state space. Let X(t) ∈ {0, 1, ..., N } be a finite continuous-time Markov chain with N + 1 states (Fig. 2). The state at time t, X(t) = n, represents the remaining energy which can support the transmission of n data points. The maximum capacity of the energy storage buffer is E max and the amount of energy required to transmit the full data buffer of k values is E k , which includes operational overhead of the embedded sensing architecture. The number of points that can be transmitted by a fully charged buffer, E max , is N = ⌊ Emax·k E k ⌋, where ⌊x⌋ is an operator rounding x to the nearest integer less than x for x > 0. This Markov model implies that the energy necessary for the batch transmission of k values is divided equally among the k stored values (Λ s ) and "spent" as soon as each value is accepted and stored (i.e., the state transitions from n to n − 1). This is in contrast to reality where the energy necessary to transmit all k values in the data buffer, E k , is spent at once during the batch transmission once the data buffer is full, and not incrementally as each value is collected as the Markov model stipulates. This is illustrated in Fig. 3(b), which compares the Markov chain state transitions corresponding to the Markov chain in Fig. 3(a), X(t), and the actual state of charge of the battery, E(t). Fig. 3(b) also shows the corresponding number of data points, ℓ, that are stored in the data buffer at each point in time. Once ℓ = k the data are transmitted to the remote estimator in batch. Despite energy arriving at times t 1 and t 4 , the state and energy level do not increase because the size of the energy buffer is finite and already at maximum capacity, E max . At time t 2 , a datum, Λ 1,s , is stored and the state decrements from X(t 2 ) = 9 to state 8, indicating that the remaining available energy can support the transmission of eight data points. Despite the fact that the datum, Λ 1,s , is stored and will not be transmitted until the data buffer is full, it is known that the measurement will be sent during the next batch transmission so the amount of energy necessary to transmit the single datum, E k k (where the energy supporting the batch transmission, E k , is divided equally among the k measurements), is accounted for immediately upon storage in the Markov model. As a result, the modeled state, X(t), is always equal to the SoC, E(t), after every batch transmission. This is illustrated in Fig.  3(b) at times t 5 , t 8 , and t 12 . This modelling assumption is justified for two reasons. First, implementation of the proposed policy is independent of the SoC (i.e., SoC at time t does not influence the event-based decision making), so the state X(t) need not always be equal to the energy level, E(t), for all time t and will not influence the steady-state optimal threshold-based policy. Second, the Markov chain assumes that data is collected only if there is enough energy to support the corresponding batch transmission, meaning that if data is collected, it will certainly be transmitted.
The energy storage buffer recharges based on arriving energy which arrives as a Poisson process [25] with rate β (energy level per time, T k ). Here, β = 1 T k , where T k is the amount of time it takes to increase the energy level of the energy buffer by E k (i.e., the amount of time required to support an additional batch transmission). A candidate data point, y, arrives as a memoryless Poisson process with rate λ (events per time, T k ). The state transitions from state n to state n − 1 when candidate data is collected and stored. Here, the transition rate from state n to state n − 1 is . Formulating a reward process as a discrete-time Markov chain in which state transitions are made at infinitesimal time steps of duration ∆, the transition rate matrix, P s (N +1)×(N +1) , corresponding to the Markov chain in Fig. 2 is Here, the duration ∆ is sufficiently small such that the selftransitions (i.e., from state n to state n) satisfy P sn,n ≥ 0. The expected reward corresponding to the optimal threshold, τ * s , in states 1 through N is R s = P [y < τ * s ]·F c +P [y ≥ τ * s ]·F y . The expected reward during each time step is λ · ∆ · R s for states 1 ≤ n ≤ N where λ · ∆ is the probability that an event will occur during the next time step, ∆. The time step, ∆, is also assumed to be small enough such that at most one event can occur during the time step. There is no expected reward when n = 0 because the sensing architecture is unable to collect data due to the loss of energy in the buffer. It follows that the reward vector, r s (N +1)×1 , is where e is a vector of ones. The primary objective of this paper is revisited here given the mathematical notation introduced in this section. The goal is to derive an optimal threshold vector, τ * s N ×1 = τ * s · e over states 1 through N that is optimal in the sense that it maximizes the recurrent class' average reward rate given the recharge rate, β, candidate data arrival rate, λ, the size of the energy and data buffers, N and k, respectively, and a prior estimate of the estimated process parameters,θ 1 , at the outset of monitoring, as well as consideration of temporal death. Since the average reward rate is defined to be the average rate of information gained about the unknown process, then the optimal threshold-based policy guarantees that where g is the steady-state reward. Implementation of the optimal threshold-based policy results in the transmission of a subset of observations that minimizes the variance of the measured process' parameter component estimates given the system constraints.
where P s and r s are defined in (2) and (4), respectively, and m is the number of time steps of duration ∆ that have occurred. Since (6) does not have a limit as m → ∞, a relative-gain vector, w, is introduced that has a limit given by Here, g · e is the steady-state reward for the Markov chain's recurrent class (where g · e = lim m→∞ P m s · r s = e · π · r s ) and π 1×(N +1) is the steady-state vector (i.e., π = π · P s ). A proof for the existence of the limits in (7) can be found in Gallager [26]. There is certainly a single unique steady-state vector, π, for the Markov chain shown in Fig. 2 because the Markov chain has finite states and is ergodic.
Due to the complexity of calculating w from (7) when there are a large number of states, both sides of (7) are multiplied by the transition matrix, P s , to get the following set of valuedetermination equations Algorithm 1 Modified HPIA applied to system model.
Step 1: Start with an arbitrary policy, τ A = [τ A , τ A , ..., τ A ] T , and calculate w A and g A from (8).
Step 3: An optimal threshold-based policy is achieved when, for all policies τ , Since there are N +1 equations and N +2 unknowns in (8) and only the relative values of w are important, one component of w can be set to zero. The relative-gain vector, w, then satisfies the equation w + g · e = r s + P s · w with w 0 = 0 and has a single unique solution. Analytically, the objective is to determine the unique optimal threshold value, τ * s , that maximizes the steady-state reward, g, in (8), the solvability of which was proved by Howard [27].

B. Policy Iteration
Howard's Policy Improvement algorithm (HPIA) is used to determine the necessary and sufficient conditions that must be imposed on the value-determination equations in (8) to derive the optimal threshold-based policy that maximizes the average reward rate [27]. The use of HPIA is used widely in literature, such as in Lei et al. [10]. HPIA consists of two primary stages that are applied sequentially and iteratively: the valuedetermination stage (Step 1) and the policy-improvement stage (Step 2 and Step 3). To avoid carrying out HPIA iteratively, which becomes computationally intensive as the data buffer size increases, the results of Federgruen et al. [28] are used to define a contraction mapping T : R R R p → R R R p to evaluate an alternate policy, τ , in Step 2. The application of the modified HPIA to the system model is outlined in Algorithm 1. The remainder of this section implements Algorithm 1 for the replenishable WSN in two parts: value determination and policy iteration.
1) Value Determination for the Replenishable WSN: Starting with an arbitrary policy, τ A = τ A · e, and implementing (8), the set of value determination equations are derived for N > k, For n = 0: If N = k, then the system of equations is fully described by (11a) and (11c).

2) Policy Iteration for the Replenishable WSN:
We define a contraction mapping T : R R R p → R R R p to evaluate an alternate policy, τ . Since an optimal threshold-based policy is achieved when, for all policies τ , (10) holds, then at τ = τ * s , where (·) ′ denotes the derivative with respect to τ . The necessary and sufficient conditions imposed by Howard [27] are characterized by (12) and imposed on the valuedetermination equations in (11). Since the optimal threshold must be the same across all states, the optimal threshold, τ * s , can be determined from a single component of (12), where The component w A k comes directly from (11) and is substituted into (12a). In (11), if N > k then 2·k and w A k−1 can be reduced to functions depending only on w A k , β, λ, α s , and R s . In (11), if N = k then The unique optimal threshold value, τ * s , is calculated from (12a); further simplification for the closed-form expression requires knowledge of the distribution type of f (y; θ 0 ). It follows that the optimal threshold-based policy has a maximum average reward rate of β · w * k .

IV. REMOTE PARAMETER ESTIMATION BY MLE USING A MODIFIED LIKELIHOOD FUNCTION
This section proves that the measured process' parameter component estimates can be reconstructed from the subset of data transmitted according to the optimal policy. As illustrated by (5), the optimal threshold-based policy derived in Section III controls data collection such that the transmitted observations maximize the expected Fisher information about the unknown process given the system constraints. If there is unlimited energy supplied to the WSN, the optimal threshold τ * s = 0 and all data are collected and transmitted regardless of their value. In this case, the standard likelihood function given full information is for s t batch transmissions to the remote estimator. For the case of WSNs under energy constraints considered in this paper, the information that is collected and sent to the remote parameter estimator during batch transmission s reflects only a subset of candidate data. By implementing the optimal threshold-based policy, τ * s , the missingness of non-transmitted data below the optimal threshold is classified as NMAR [16] because it depends on censoring the observed values. Because the rejection of data according to the optimal threshold-based policy is NMAR, rejected data is nonignorable. The likelihood function characterizing transmitted data must account for consideration of left-censored data. Given the optimal event threshold for batch transmission s, the authors introduce a modified likelihood function that allows for a maximizer to exist even when data is left censored Here, l c,s denotes the number of observations that are left censored and rejected as the buffer, s, fills up. Under regulatory conditions, the standard full-information MLE corresponding to the likelihood function in (15) is a consistent, asymptotically normal, and efficient estimator.
The remainder of Section IV shows that for the modified likelihood function characterized by (16), the MLE is the maximizer, a CRB on the covariance matrix of the estimator exists, and the MLE is consistent, asymptotically unbiased, and asymptotically normal, which lends this modified likelihood function to practical use. For ease of notation, the following proofs will let s t = 1 since the discussion extends naturally for s t > 1.
Assumptions: The following regulatory conditions are considered herein. Let θ ∈ Θ be a p × 1 vector and assume: A1. Θ is an open subset of R R R p ; A2. The PDF f (y; θ) is smooth [29] and differentiable in θ; A3. The covariance matrix, cov θ (θ), and the Fisher information matrix, F (θ) (defined in Theorem 1), are non-singular matrices; A4. The support of y, {y : f (y; θ) > 0}, does not depend on θ; A5. The model is identifiable, meaning that for every θ ∈ Θ, there does not exist anotherθ ∈ Θ such that f (y; θ) = f (y;θ) for all Λ s in the sample space.
Theorem 1: If the modified log-likelihood of y, ℓ m (θ), satisfies the regulatory conditions, then the modified loglikelihood is a concave function.
Proof : The modified log-likelihood function, ℓ m (θ), is a concave function if −∇ 2 θ ℓ m (θ) is a positive semi-definite matrix. Let ∇ θ and ∇ 2 θ denote the gradient and Hessian operators with respect to θ, respectively. The modified loglikelihood function for a single observation is where P 0 is the probability that temporal death occurs (i.e., n = 0) andP 0 = 1 − P 0 . Here, P 0 = π 0 where π 0 is the first component of the steady-state vector, π (which is not a function of Λ s,i ).
Step 1: First, it is shown that the gradient of the expectation of the modified log-likelihood function is maximum at the true parameter and is the unique maximum (see Appendix A), Step 2: Second, lettingθ denote an unbiased estimator of θ, it is demonstrated that the correlation between the estimator and the gradient of the log-likelihood is constant (see Appendix A), where I p×p is the identity matrix.
Step 3: The covariance matrix of the concatenated estimator error and gradient gives the relation between the estimator covariance and the Fisher information matrix. Define a random vector U as Since any matrix expressed as an outer product of two vectors is non-negative definite, Using the results of Steps 1 and 2, Since E θ [U U T ] is a partitioned symmetric matrix that is positive semi-definite and it is assumed that F (θ) is a nonsingular matrix, then cov θ (θ) is positive semi-definite, F (θ) is positive semi-definite, and Since, by definition, and F (θ) ≥ 0, the modified log-likelihood function, ℓ m (θ), is a concave function and there exists a unique MLE,θ. Theorem 2: Suppose that the modified log-likelihood of y, ℓ m (θ), satisfies the regulatory conditions. There exists a CRB on the covariance matrix of the estimator.
Proof : Noting from the regulatory conditions that the covariance and Fisher information matrices are non-singular, the result comes directly from (23) in Theorem 1.
Theorem 3: If the modified log-likelihood of y, ℓ m (θ), satisfies the regulatory conditions, then the MLE is consistent, where consistency is defined aŝ where j is the number of samples.
Proof : Theorem 1 proves thatθ is the value of θ that maximizes the modified log-likelihood in (17). It is now noted that for any θ ∈ Θ the Law of Large Numbers implies the convergence in probability of Since, by Theorem 1, E θ [ℓ m (θ)] is uniquely maximum at θ 0 , thenθ j P − → θ 0 . Theorem 4: If the modified log-likelihood of y, ℓ m (θ), satisfies the regulatory conditions, then the MLE is asymptotically unbiased and normal, with asymptotic normality defined as Proof : Theorem 1 proves thatθ maximizes the modified log-likelihood function, meaning that ∇ θ ℓ m (θ) = 0. From Theorem 3, consistency of the MLE ensures convergence in probability ofθ j to θ 0 as j → ∞. This justifies the application of a first-order Taylor expansion to ∇ θ ℓ m (θ) = 0 around By the Law of Large Numbers, the second term in (28) is in probability. Further, by the Central Limit Theorem, since ∇ θ ℓ m (θ 0 ) has mean zero and covariance F (θ 0 ), the first term in (28) can be expressed as in distribution. It is concluded by substituting (29) and (30) into (28) that by the Continuous Mapping Theorem and Slutsky's Lemma, (28) holds and the MLE is asymptotically unbiased and asymptotically normal.

V. RESULTS
Within the civil, mechanical, and aerospace engineering domains, the research community has widely embraced the use of long-term monitoring data for structural health monitoring (SHM). An integral part of SHM is the characterization of structural or mechanical response in critical assets, where the estimate is used to assess the probability of failure of the monitored asset [ [37]. SHM systems using WSNs emerged in the mid-1990's and have been growing in popularity as a lower cost and easily deployable alternative to traditional wired sensing systems [38]. The operation of WSNs used for SHM typically relies on solar harvesting; given energy resource constraints, a WSN should be able to update parameter estimates of structural response as frequently as possible to track the asset's condition.
Consider the replenishable sensing system shown in Fig. 4. An Urbano wireless sensing architecture [39] is powered by a replenishable battery [40] that harvests energy from a solar panel [41] via a solar controller [42]. Since the maximum capacity of the battery considered is 58%, a MOSFET circuit [43] disconnects the solar panel when the battery's SoC exceeds 50%, as measured by a fuel gauge [44]. In order to operate the lithium-polymer battery safely, when the measured SoC drops below 20%, the battery is considered "dead" and all arriving events are rejected. Setting practical lower and upper thresholds on the battery SoC minimizes the high discharge and charging stresses that can reduce battery life and cause low discharge efficiency. The solar panel is placed 0.2 meters from a 600-watt halogen light source in an otherwise dark room. The Urbano wireless sensing architecture, which is operated by an ATMega2561V microcontroller [45], continuously reads maximum strain measurements that are simulated by an external Arduino UNO [46]. In order to illustrate practical uses of this work, the strain measurements are intended to reflect the structural response induced by trucks crossing over a highway bridge, which is an application area that is considered widely across the field of SHM [47] [48] [49]. Batch transmission is carried out using a Nimbelink cellular modem [50] configured with a Taoglas hexa band cellular antenna [51].
Following the procedure outlined in Section II, the experimental setup is modeled by the Markov Chain in Fig. 2 and has the following parameters: • The solar energy harvesting is modeled as a Poisson process [52] where the energy arrival rate for the experimental setup is β = 0.16 (energy level/minute); • Trucks cross the bridge following a Poisson process [53] where the arrival rate is λ = 8 (events/minute); • The measured process (i.e., maximum strain value induced by a crossing truck) is exponentially distributed with true parameter θ 0 = η, where the rate parameter is η = 0.3; • The battery can support four batch transmissions, where each batch transmission has k = 5 messages each (i.e., N = 20); • Several initial rate parameters,θ 1 , are considered (specified subsequently in Fig. 8), which represent the original belief about the measured process at the outset of monitoring.
Considering the experimental input parameters, Fig. 5 shows the average reward rate calculated from (12a) associated with varying threshold values and assumingθ 1 = 0.3. Fig. 5 compares the average reward rate corresponding to varying threshold values for the case of limited energy to that of unlimited energy. As τ s ≫ τ * s , the probability of storing candidate data (i.e., y ≥ τ * s ) decreases because so few measurements are observed in the upper extremum (requiring very infrequent transmission) and the limited energy and unlimited energy average reward rates converge. Conversely, as τ s ≪ τ * s , the availability of energy becomes a bottleneck when energy is limited. As the probability of storing candidate data increases as τ s → 0, the energy buffer cannot support all of the transmissions and the energy buffer is repeatedly depleted (i.e., the steady-state probability that X(t) = 0, P 0 ,  increases). This results in a decrease in the average reward rate because no information is earned by data that arrives when the sensing node is shut down. Hence, when the threshold, τ s , is higher than the optimal threshold, τ * s , the data collection and transmission process is inefficient and underutilizes the available energy, and when the threshold, τ s , is lower than the optimal threshold, τ * s , the WSN attempts to collect too much data which increases the probability, P 0 , that the system is in a non-operating state because the energy buffer is fully depleted. Fig. 5 also shows that, as expected, the average reward rate corresponding to the optimal threshold increases as the energy arrival rate increases (corresponding to β = 0.8 and β = 1.6).
Implementing the optimal threshold, τ * s , calculated from (12a) (shown in Fig. 5) in the experimental setup, Fig. 6 shows the variance of the estimated rate parameter, η, that results from 24 1-hour experimental data collection periods for 15 varying threshold values. The experimental results closely reflect the variance of the full system simulated in Matlab [54] (Fig. 6); the simulated environment allows for the rapid convergence of the variance given 10, 000 data collection iterations for each threshold value. As expected from (5) and illustrated in Fig. 6, when β = 0.16, the variance of the estimated process parameter is minimized when the optimal threshold value, τ * s = 8.14, is implemented. Since the results in Fig. 6 illustrate the feasibility of the implementation of the theoretical work in a realistic experimental setting, the remainder of the results herein will leverage the simulation environment.
An important part of the proposed work is to illustrate that performance gains are achieved by accounting for a data buffer in the system-level model. Considering a 10-hour data collection period, Fig. 7(a)-(b) illustrates the reduction in the variance of the estimated rate parameter, η, that results from consideration of a data buffer with size k = 5 ( Fig. 7(a)) as compared to no data buffer, meaning k = 1 (Fig. 7(b)). Fig. 7(a)-(b) also illustrates that as the energy arrival rate increases (corresponding to β = 0.8 and β = 1.6), the optimal threshold decreases resulting in the variance of the rate parameter, η, decreasing. Extending the results of this work beyond the specific parameters of the experimental setup, Fig.  7(c) shows the variance of the rate parameter, η, that results from implementation of the optimal threshold, τ * s , for varying event arrival rates, λ, energy arrival rates, β, and data buffer sizes, k.
The robustness of the proposed method is illustrated through the injection of uncertainty into the initial estimate of the measured process,θ 1 . Fig. 8 shows the convergence of the rate parameter's variance over 1, 000 iterations of a 10-hour data collection period where the energy arrival rate is β = 0.16, the event arrival rate is λ = 8, and initial estimates ofθ 1 = 0.3 (i.e., the true process parameter),θ 1 = 0.2, andθ 1 = 0.4 are considered. Despite having incorrect prior parameter estimates, within the 10-hour data collection period, the variance of the rate parameter, η, for each case converges to the variance corresponding to implementation of the optimal threshold that results from the true process parameter, θ 0 . This variance is indicated by the dashed line in Fig. 8, which is the variance of the rate parameter, η, identified at point B in Fig. 7(c).
VI. CONCLUSIONS This paper presents the derivation of an optimal, event-based data collection and transmission policy for remote parameter Fig. 8: Variance of the rate parameter, η, over the 10-hour monitoring period for 2, 500 iterations with three initial estimates of the measured process parameter defined a priori. estimation in wireless sensing architectures subject to resource and hardware constraints. Given a WSN with a finite-size replensihable energy buffer, finite-size data buffer, stochastic models of energy and event arrivals, the value of information of measured data, and consideration of temporal death, the proposed policy controls the storage and transmission of observed data such that the average Fisher information rate of observed and censored observations is maximized. This paper proves that the measured process' parameter component estimates can be reconstructed from the subset of data transmitted according to the optimal policy using a modified likelihood function. Under regulatory conditions, the proposed modified MLE retains desirable properties that are characteristic of the full-information MLE assuming unlimited energy (i.e., all data are transmitted). The modified MLE, which accounts for the missingness of data not transmitted, is the maximizer, a CRB on the covariance matrix of the estimator exists, and the MLE is consistent, asymptotically unbiased, and asymptotically normal. Since the average information rate is defined to be the average rate of Fisher information gained about the unknown process component parameters, it is shown that the optimal threshold-based policy guarantees the transmission of a subset of observations that minimizes the variance of the measured process' parameter component estimates given the system constraints. Consideration of a data buffer, and corresponding batch transmission, within the system model leads to significant gains in reducing the variance of the parameter component estimates and broadens the scope of its possible applications.
The developed theoretical framework is implemented in both experimental and simulation-based settings, where data collection and transmission in an EH WSN architecture are controlled to estimate the measured process' parameter subject to stringent energy and hardware constraints. The experimental results closely reflect the results of the system-level simulation. The experimental and simulation-based results are presented to reflect the objectives of the theoretical derivation: given the EH WSN architecture is subject to energy constraints imposed by the availability of incoming energy and battery size, the derived optimal threshold, τ * s , produces the best possible estimate of the process parameters given the system constraints, and performance gains are achieved by accounting for a data buffer. The results illustrate that the proposed framework is robust against the injection of uncertainty into the initial estimate of the measured process at the outset of monitoring.
In order to account for energy and event arrival rates that change over time, which is common for WSNs relying on energy harvesting from the environment, next steps will aim to incorporate predictive estimation of the recharge and event arrival rates, β and λ, respectively, as well as timevarying arrival processes. Short-term forecasting will also enable the integration of online supervisory control schemes that dynamically control whether the sensing architecture is accepting measurements or in a low-power sleep state in which all incoming data is discarded, even when the energy buffer is not fully depleted. An important step in using the proposed policy within real-world applications will be to account for communication latency by introducing an additional state during transmission in which the WSN is unable to measure and store arriving data.