Semantics-Aware Active Fault Detection in Status Updating Systems

With its growing number of deployed devices and applications, the Internet of Things (IoT) raises significant challenges for network maintenance procedures. In this work we address a problem of active fault detection in an IoT scenario, whereby a monitor can probe a remote device in order to acquire fresh information and facilitate fault detection. However, probing could have a significant impact on the system's energy and communication resources. To this end, we utilize Age of Information as a measure of the freshness of information at the monitor and adopt a semantics-aware communication approach between the monitor and the remote device. In semantics-aware communications, the processes of generating and transmitting information are treated jointly to consider the importance of information and the purpose of communication. We formulate the problem as a Partially Observable Markov Decision Process and show analytically that the optimal policy is of a threshold type. Finally, we use a computationally efficient stochastic approximation algorithm to approximate the optimal policy and present numerical results that exhibit the advantage of our approach compared to a conventional delay-based probing policy.

detection procedures is necessary to safely and efficiently operate an IoT network.The majority of fault detection algorithms that have been proposed in the past [1], [2], assume that the system is passively monitored and utilize statistical or machine learning techniques to infer the actual health status of its subsystems.However, a major drawback with passive monitoring is that faults can pass undetected if the faulty and the nominal operation overlap due to measurement and process uncertainties or in cases where control actions mask the influence of faults [3].To address this problem we make use of an active fault detection scheme that utilizes probes to affect the system's response and thus to increase the probability of detecting certain faults.
With active fault detection special care must be taken so that the extra network traffic due to probing is not detrimental to the system's performance.This requirement can become a significant challenge if active fault detection is to be deployed in IoT networks with a large number of remote devices.Blindly generating and transmitting probes could increase network congestion and prohibit other applications from satisfying their possibly strict real-time constraints.To this end, we adopt a semantics-aware [4] approach to active fault detection.Within the context of semantics-aware communications the generation and transfer of information across a network are considered jointly in order to take into account the goal or purpose of the communication.What is more, the importance/significance of a communication event, i.e., the event of generation and transmission of information, constitutes the decisive criterion of whether it should take place or not.The definition of the importance of a communication event is application-specific, thus, in the context of active fault detection, we define it to be a function of the freshness of information that has been received from the remote device and of the operational status of the communication network and the remote device.To put it simply, the importance of a probe increases when the information sent by the remote device has become stale and the probing entity is not confident that the operational status of the communication network and the remote device is good.Numerical results indicate that, in contrast to the classical communication paradigm where information generation and its transmission are treated separately, the semantics-aware approach offers significant advantages.
More specifically, in this work, we consider a basic active fault detection scenario for a discrete-time dynamic system that is comprised of a sensor and a monitor.At the beginning of each time slot, the sensor probabilistically generates and transmits status updates to the monitor over an unreliable link while the monitor decides whether or not to probe the sensor for a mandatory transmission of a fresh status update through a separate unreliable link.By the end of each time slot, the monitor may or may not receive a status update either because none was generated at the sensor or due to intermittent faults at the sensor and the wireless links.To detect intermittent faults, the monitor maintains a belief vector, i.e., a probability distribution, over the operational status (healthy or faulty) of the system and a measure of its confidence in this belief vector that is expressed by the entropy of the belief vector.Probing, successfully or unsuccessfully, increases the confidence of the monitor in its belief state, however, it also induces a cost for the monitor that measures the negative impact of probing on the system's energy and communication resources.Our objective is to find a policy that decides at each time slot whether or not a probe should be sent to the sensor so that it optimally balances the probing cost with the need for fresh information at the monitor.
Our approach to solving this problem is to formulate it as a Partially Observable Markov Decision Process (POMDP) and derive the necessary conditions for probing to result in a reduction of the belief state's vector entropy.To the best of our knowledge, this is the first work with this approach.Our analysis indicates that there exist probing cost values such that the optimal policy is of a threshold type.In addition, we propose a stochastic approximation algorithm that can compute such a policy and, subsequently, evaluate the derived policy numerically.

A. Related work
Fault detection methods can be categorized as passive, reactive, proactive, and active.Passive fault detection methods collect information from the data packets that the wireless sensors February 3, 2022 DRAFT exchange as part of their normal operation whereas in reactive and proactive fault detection methods the wireless sensors collect information related to their operational status and subsequently transmit it to the monitor.Finally, in active fault detection methods the monitor probes the wireless sensors for information specific to the fault detection process.In Wireless Sensor Networks (WSNs) the fault detection algorithms being cited in recent works and surveys [2], [5]- [7] fall in the passive, reactive, and proactive categories, with the majority of them being passive fault detection algorithms.Active fault detection methods for WSNs have received limited attention [6] compared to the other three categories.In [8] and [9] the authors adopted an active approach to fault detection in WSNs.However, both of these tools were meant for pre-deployment testing of WSNs software rather than a health status monitoring mechanism.
Unlike these works, we propose an active fault detection method for continuously monitoring the health status of sensors.We believe that autonomous active fault detection methods can successfully complement the passive ones by addressing their limitations.More specifically, passive fault detection methods often fail to detect faults because the faulty and the nominal operation overlap due to measurement and process uncertainties.What is more, network control mechanisms specifically designed to increase the robustness of the IoT network, e.g., by delegating the job of a senor to neighboring or redundant nodes, often compensate for the performance degradation due to intermittent faults and thus mask their influence rendering them undetectable [3].Acknowledging the fact that the network overhead due to active fault detection can be prohibitive, we adopted the semantics-aware communication paradigm [4], [10]- [16] which has exhibited its ability to eliminate the transmission of redundant and uninformative data and thus minimize the induced overhead.
Active fault detection methods have also been studied in the context of wired networks [17]- [22].However, the operational conditions of wired networks differ considerably from these of WSNs in terms of protocols, energy, bandwidth, and transmission errors so that techniques proposed for wired networks cannot be applied in WSNs.

II. SYSTEM MODEL
We consider the system presented in Figure 1.It is comprised of a sensor that transmits status updates to a monitoring device over the wireless link labeled SM .The monitoring device, besides receiving the status updates from the sensor, is able to probe the sensor for a fresh status update over the wireless link labeled M S. Transmissions over the links M S and SM are subject to failure.Failures are independent between the two links.We assume that time is slotted and indexed by t ∈ Z + . Monitor Fig. 1: Basic IoT setup.
The state of the sensor is modeled as an independent two state time-homogeneous Markov process.Let F S t ∈ {0, 1} be the state of the sensor's Markov process at the beginning of the t-th time slot.When F S t has a value of 0/1, the i-th sensor's operational status is healthy/faulty.
We assume that the sensor will remain in the same state for the duration of a time-slot and, afterwards, it will make a probabilistic transition to another state as dictated by the state transition probability matrix P S .Furthermore, at the beginning of each time slot the sensor will generate a status update with probability P g , when in a healthy state, while it will not generate a status update when in a faulty state.In this work we assume that P g < 1, otherwise the probing system is redundant.In the case of a status update generation the sensor will transmit it over the SM link.At the end of the time slot the status update is discarded independently from the outcome of the transmission.
Similarly, we model the health status of the wireless links as two independent two-state time- We assume that the wireless links will remain in the same state for the duration of a single timeslot and, subsequently, they will make a transition to another state as dictated by the transition probability matrices P M S and P SM respectively.When in a healthy state, the wireless link will forward successfully a status update to the monitor with probability 1, whereas a faulty wireless link will always fail to deliver the status update.
In this work, we consider the problem of an autonomous agent that optimally probes the sensor to maximize the aggregate Value of Information (VoI) over a finite horizon.The VoI metric depends on how confident the agent is about the system's health status as well as on the freshness of the status updates it receives.Since the agent doesn't have access to the true health status of the system it has to maintain a belief state P t , i.e., a probability distribution over all possible states of the system at time t, based on the observations it makes, i.e., the arrival of status updates.Using P t , the agent derives P h t , which represents its belief about the health status of the subsystem of Figure 1 that is comprised of the sensor and the l SM link, i.e., the subsystem that is responsible for the generation and transmission of the status updates.In the following sections we will refer to this subsystem as the Generation-Transmission (GT) subsystem.P h t is a probability distribution over two states that correspond to a healthy and a faulty GT-subsystem respectively.We represent the confidence level of the agent, regarding the health status of the GT-subsystem, with the information entropy of P h t , i.e., H(P h t ), which, for simplicity, we denote with H t .The derivation of P h t from P t and the formula for the computation of H t are presented in Section III right after the analytical definition of the belief state.In the following sections we will refer to H t as the health status entropy of the belief state P t .A low entropy value means that the agent is confident about the health status of the GT-subsystem while a high entropy value means that the agent is less confident about it.
Furthermore, to characterize the freshness of the status updates received at the monitor we utilize the Age of Information (AoI) metric that has received significant attention in the research DRAFT February 3, 2022 community [23]- [31].AoI was defined in [32] as the time that has elapsed since the generation of the last status update that has been successfully decoded by the destination, i.e., ∆(t) = t−U (t), where U (t) is the time-stamp of the last packet received at the destination at time t.We use ∆ t , t = 0, 1, . . ., N , to denote the AoI of the sensor at time t.However, as the time horizon of the optimal probing problem increases ∆ t could assume values that would be disproportionately larger than H t .To alleviate this problem we will use a normalized value of the AoI which we define as, ∆t = ∆t N , where N is the length of the finite horizon measured in time-slots.Finally, VoI is defined as, where λ 1 and λ 2 are weights that determine the relative importance of each component of the metric.
At the beginning of each time-slot the agent will consider its belief about the system's state and the VoI of the system, and it will decide whether to probe the sensor for a fresh status update or not.If the agent decides to send a probe it will pay a cost of c units.Following a successful reception of a probe through the M S link, the sensor will generate a fresh status update at the next time slot with probability

Actions (A):
The set of actions available to the agent is denoted with A = {0, 1}, where 0 represents the no-probe action and 1 indicates the probe action.The result of the probe action, given that the probe is successfully received through the M S link, is that the sensor will generate a fresh status update at the next time slot w.p. 1, if it is in a healthy state, and w.p. 0 if it is in a faulty state, respectively.Both actions are available in all system states.Finally, we denote the action taken by the agent at the beginning of the t-th time slot with a t ∈ A. The action taken by the agent does not directly affect the state of the system, nevertheless it affects the observation made by the agent.

Random variables:
The state of the system presented in Fig. 1 will change stochastically at the beginning of each time slot.The transition to the new state is governed by the transition probability matrices P M S , P S and P SM , that were presented in Section II, and the state of the system during the previous time slot.
As mentioned above, the agent has no knowledge of the system's actual state and is limited to observing the arrival of status updates.The observations are stochastic in nature and are determined by the action taken by the agent, the state of the system and the following random variables.The random variable W g t ∈ {0, 1} represents the random event of a status update generation at the t-th time slot.If a status update is generated by the sensor then W g t takes the value 1 and if the sensor does not generate a status update then W g t takes the value 0. We have DRAFT February 3, 2022 the following conditional distribution for and where we omitted the time index since the distribution is assumed to remain constant over time.
The random variable W M S t ∈ {0, 1} represents the random event of a successful transmission over the M S link during the t-th time slot.A value of 0 indicates an unsuccessful transmission over the link and a value of 1 indicates a successful transmission.The conditional probability distribution for W M S t is given by, and , where again we omitted the time index t.Finally, the random variable W SM t ∈ {0, 1}, represents the random event of a successful transmission over the SM link during the t-th time slot.A value of 0 indicates an unsuccessful transmission over the link and a value of 1 indicates a successful transmission.The conditional probability distribution for W SM is given by, and Transition probabilities (P ): Let m be an index over the set of the three subsystems presented in Fig. 1, i.e., m ∈ {M S, S, SM }, then the transition probability matrices P M S , P S and P SM can be defined as follows, , where p m 00 represents the probability to make a transition from a healthy state (0) to a healthy state (0) for subsystem m.Transition probabilities p m 01 , p m 10 , and p m 11 are defined in a similar way.Furthermore, we introduce the shorthand notation TABLE I: Observation probabilities r s (a, z) as a function of the health status of the sensor (F S ), of the M S link (F M S ), of the SM link (F SM ), the action (a t−1 ) and the observation z t .
T and s t+1 respectively so that the conditional probability distribution of state s given the current state s can be expressed Observations (Z): At the beginning of each time slot the agent observes whether a status update arrived or not.Let z t ∈ {0, 1} denote the observation made at the t-th time slot, with 0 representing the event that no status update was received and 1 representing the event that a status update was received.We define r s (a, z) as the probability to make observation z at the t-th time slot, i.e., z t = z, given that the system is in state s, i.e., s t = s, and the preceding action was a, i.e., a t−1 = a.Thus we have, r s (a, z) = P [z t = z|s t = s, a t−1 = a].The expressions of r s (a, z) for all combinations of states and actions are presented in Expressions ( 5), ( 6), (7), and (8).By utilizing the conditional probability distributions presented in Section III we derived the observation probabilities for all possible combinations of states and controls and present them in Table I.
DRAFT February 3, 2022 The evolution of the AoI value over time depends on the observation made by the agent and, where N is the finite time horizon of the optimization problem.
Transition cost function (g, g N ): At the end of each time slot, the agent is charged with a cost that depends on the VoI and the action taken by the agent as follows, where, 1 {at=1} is the indicator function which takes a value of 1 when the probe action was taken by the agent and a value of zero otherwise, and V t is computed using Equation (1).Parameter c is a cost value associated with probing and quantifies the consumption of system resources for the generation and transmission of a probe.The use VoI as a cost metric is justified by the fact that it expresses how much the agent is in need for a fresh status update from the sensor.
Total cost function: In a POMDP the agent doesn't have access to the current state of the system, thus, to optimally select actions it must utilize all previous observations and actions up to time t [33, Chapter 4].Let h t = [z 0 , z 1 , . . ., z t , a 0 , a 1 , . . ., a t−1 ] be the history of all previous observations and actions, with h 0 = {z 0 }.Furthermore, let H be the set of all possible histories for the system at hand.The agent must find a policy π * that maps each history in H to a probability distribution over actions, i.e., π : H → P (A), so that the expected value of the total cost accumulated over a horizon of N time slots is minimized.Let Π be the set of all feasible policies for the system at hand, then, assuming that the agent's policy is π ∈ Π and has an initial history h 0 the expected value of the total cost accumulated over a horizon of N time slots is, where expectation E{•} is taken with respect to the joint distribution of the random variables in ] T for t = 0, 1, . . .and the given policy π.Our objective is to find the optimal policy π * which is defined as π * = arg min π∈Π J π,N (h 0 ).

February 3, 2022 DRAFT
For finite N the optimal policy π * can be obtained by using the dynamic programming algorithm.However, the difficulty with this approach is that the dynamic programming algorithm is carried out over a state space of expanding dimension.As new observations are made and new actions are taken the dimension of h t increases accordingly.To overcome this difficulty h t can be replaced by a sufficient statistic, i.e., a quantity that summarizes all the essential content of h t that is necessary for control purposes.In the POMDP literature a sufficient statistic that is often used is the belief state which is presented in the following section.
Belief State: At each time slot t the agent maintains a belief state P t , i.e., a probability distribution over all possible system states, P t = [p 0 t , . . ., p 7 t ] T .Starting from an arbitrarily initialized belief state P 0 the agent updates its belief about the actual state of the system at the beginning of each time slot as follows, where In the literature p ij is usually a function of the action selected at time t, i.e., p ij (a t ), however, in our case the actions taken by the agent do not affect the system's state.In any case, the action taken by the agent affects the observation z t+1 made by the agent and thus directly affects the evolution of the belief state over time.As mentioned in Section I, based on P t the agent forms the health status belief vector P h t that represents our belief regarding the health status of the sub-system comprised of the sensor and the l SM link.We have, , where p h t and p f t represent, respectively, the probabilities for the sub-system to be in a healthy or faulty state.We define p h t = p 0 t + p 4 t , since states with index 0 and 4 in Table I are the only states where both the sensor and the l SM link are in a healthy state.Correspondingly, we define It holds that P h t is a probability distribution since p h t and p f t are computed over complementary subsets of the system's state space and P t is a probability distribution.Finally, the health status entropy is computed as For the agent to have all the information necessary to proceed with the decision process it DRAFT February 3, 2022 must also keep the value of the AoI as part of its state, thus we augment the belief state with the value of AoI and define the following representation of the current state, i.e., x t = [P t , ∆t ], and define X to be the set of all states.
Dynamic program of P: By utilizing the belief state formulation and for a finite horizon N the optimal policy π * can be obtained by solving the following dynamic program, for all x t ∈ X and t = 0, 1, and the terminal cost is given by J N (x N ) = g N .The formulation of ( 12) differs from the typical dynamic program for the general case of a POMDP [33], [34], due to the fact that the transition cost depends only on the observed values.
It is known that for (12) there do exist optimal stationary policies [33], [34], i.e., π * = {π * 0 , π * 1 , ..., π * N −1 }.However, since the state space X is uncountable the recursion in (12) does not translate into a practical algorithm.Nevertheless, based on (12) we can prove that the optimal policy has certain structural properties that can be utilized for its efficient computation.

IV. ANALYSIS
In this section we present structural results for the optimal policy of the POMDP P defined in the previous section.In order to represent the belief state at the (t + 1)-th time slot one has to consider the action that was taken at time t, i.e., a t , and the observation made at (t + 1), i.e., z t+1 , thus, we use P a,z t+1 to represent the belief state at the (t + 1)-th time slot, when a t = a and z t+1 = z.In this work we assume that POMDP P satisfies the following two assumptions.) ≥ H(P h,a,z t+1 ), a, z ∈ {0, 1}.
Assumption 1 states that the health status entropy H(P h,a,z,+ t+1 ) of the belief state P a,z,+ t+1 , which results from belief state P + t by taking action a t = a and subsequently observing z t+1 = z, will be larger than the health status entropy that would result if the system had started in belief state P t , which has a lower health status entropy than P + t , given that the same action and observation had been made in both cases.
Assumption 2. Let I S = {0, 1, . . ., 7} and i ∈ I S be the index of the system's state , and i 2 = F SM (see Table I).
Furthermore, let p S i 1 0 , p SM i 2 0 be the probabilities for the sensor S and the link l SM to make a transition from health status i 1 and i 2 , respectively, to a healthy status (indicated by 0) at t + 1.
We assume that for the POMDP P the following inequality is true, Assumption 2 expresses the necessary conditions and system's parametrization for the probing action to always result in a lower health status entropy compared to the no probe action.It may seem intuitive that probing reduces entropy, since it makes the generation of a status update from the sensor mandatory, i.e., it reduces the uncertainty induced in the system due to the probabilistic generation of status updates from the sensor, however, one should also consider that probing introduces a new type of uncertainty in the system due to the transmission failures occurring in the M S link.As an example consider the case where a probe was sent to the sensor yet no status update was received by the monitor.It is not certain whether this happened because the probe didn't actually reach the sensor, due to a faulty M S link, or because the sensor, or the SM link, or both were in a faulty state.Assumption 2 expresses the effect of faults in the M S link along with that of parameters p S i 1 0 , p SM i 2 0 and P g on the health status entropy (for details see Appendices A and B) and it is used in the following lemma to prove that the probe action will always result in the same or reduced health status entropy compared to the no-probe action for a given observation z at time t + 1.
The proof is given in Appendix A. Next, in Lemma 2, we show that the expected cost-to-go from decision stage t up to N is an increasing function of the health status entropy.
Lemma In Lemma 4 we prove properties of the cost-to-go function J t (•) that are necessary to establish the structural properties of the optimal policy in Theorem 1 .
Lemma 4. Let J t (x t ) be the value of the dynamic program of P at x t = [P t , ∆t ] then J t (x t ) is piece-wise linear, increasing, and concave with respect to H(P h t ) and ∆ for t = 1, . . ., N .
Proof : The proof of Lemma 4 is given in Appendix D.
Finally, in Theorem 1 we show that there exist configurations of POMDP P such that the optimal policy is threshold based.
Theorem 1.At each decision stage t = 0, 1 be the optimal threshold values for the health status entropy and the normalized AoI at stage t, then the optimal policy can be expressed as T for t = 0, 1, . . ., N can be a computationally demanding task, especially if one considers large time horizons.To address this problem we approximate the optimal policy π * with a single threshold and utilize a Policy Gradient algorithm, namely, the Simultaneous Perturbation Stochastic Approximation (SPSA) Algorithm [35] in order to find it.
The SPSA algorithm appears in Algorithm 1 and operates by generating a sequence of threshold estimates,

VI. NUMERICAL RESULTS
In this section, we evaluate numerically the cost efficiency of the threshold probing policy we derived previously.Furthermore, we provide comparative results with an alternative probing policy that is often used in practice.The latter policy will probe the sensor whenever the time that has elapsed since the last arrival of a status update at the monitor exceeds a certain threshold.
We will refer to this policy as the delay based policy whereas we will refer to the single threshold policy that approximates the optimal policy as the threshold policy.We also note here that for the system we consider in this work, the delay and AoI metrics coincide.This holds because the sensor does not buffer status updates, and the status update generation scheme is fixed.A consequence of this observation is that the results we present in this section exhibit the comparative advantage in using VoI instead of AoI when deciding whether to probe or not, which shows that AoI itself cannot significantly capture the semantics of information except for timeliness.
Furthermore, in order to gain insight into how the various system parameters affect the performance of the probing policies we formulated a basic scenario and subsequently varied its parameters.For this scenario, the system was configured as follows, c = 1, λ 1 = 1, λ 2 = 1, Fig. 3: J 0 (•) vs. τ f SM when we increase the probability for the sensor to enter a faulty state.
On the other hand, the delay policy with D = 90 practically never probed the sensor since 90 is close to the entire time horizon of 100 time slots.
From Figure 2 we see that when τ f SM was less than 0.20 the delay policy with D = 90 and the threshold policy had similar cost efficiency.This means that probing was rarely needed for that range of τ f SM values.When τ f SM lied in the range between 0.20 and 0.30 the cost induced by the delay policy with D = 90 increased with a higher rate compared to all other policies.This indicates that probing became necessary in order to reduce cost Ĵ0 .This is evident also by the fact that the delay based policy with D = 10 performed closer to the threshold policy within this range of τ f SM values.Finally, when τ f SM became larger than 0.30 all delay policies saw an increment in their induced cost Ĵ0 while the threshold based policy managed to reduce the value of Ĵ0 .In this latter range of τ f SM values the periods that the l SM link was in a faulty state increased in duration due to the increasing p 11 SM value.As a consequence the time interval between status update arrivals increased and all delay based policies engaged in probing the sensor when delay exceed their D threshold.However, while the l SM link was in a faulty state, no status update could reach the monitor.As a result, and despite the persistent probing of the sensor by the delay based policies, neither the health status entropy nor the normalized AoI could be reduced.This is particularly evident in the abrupt increase in cost for the delay policies with D = 1 and D = 10 which persisted in probing the sensor with higher frequency and for longer periods due to their low values for D. On the other hand, the threshold based policy was able to avoid unnecessary probing by utilizing the health status entropy along with the normalized AoI, i.e, it would defer probing while it was confident that the system was in faulty state.
In Figure 3 we present cost Ĵ0 for a wider range of values for τ SM .More specifically, we modified the basic scenario by increasing p 01 SM from 0.1 to 0.2.As a result, the l SM link entered more often its faulty state compared to the basic scenario and this provided for a wider range of τ f SM values.All policies exhibited the same behavior as in the basic scenario for values of τ f SM up to 0.5.However, when τ f SM got larger than 0.5 we observed a reduction in the induced cost Ĵ0 for or all policies except for the delay based policies with D = 1 and D = 10.The observed reduction in Ĵ0 was mainly due to the reduction in the cost induced by the health status entropy.
More specifically, for large values τ f SM , i.e., for large values of p 11 SM and p 01 SM , the monitor was confident about the health status of the system, i.e., that the system is in a faulty state mainly due to the l SM link, and this resulted in a reduced health status entropy.The delay policies with D = 1 and D = 10 increased the induced cost Ĵ0 by persistently probing the sensor while it was in a faulty state until they succeeded in the transmission of a status update.As expected, for large values of τ f SM , this behavior resulted in a large number of unnecessary probes.Policies with a larger value of D were also engaged into this type of probing albeit with a significantly less frequency.
In Figure 4 we present the effect of the health status entropy on the cost Ĵ0 over all policies.
We modified the basic scenario by setting, would never enter a faulty state and, even if they were randomly initialized to a faulty state they would return to the healthy state with probability 1 in the next time slot.Thus, the system could be in one of two possible states, i.e., the states with indices i = 0 and i = 1 in Table I.This comes in contrast to the eight possible states of the basic scenario and resulted in reduced cost due to health status entropy for all policies and the whole range of τ f SM values.Furthermore, in Figure 4 we do not observe a significant reduction of Ĵ0 when τ f SM ≥ 0.5 as in Figure 3.This is because, the decrement of health status entropy cost as τ f SM increased, in this scenario, was not as significant as the increment of normalized AoI and probing costs.Finally, in Figure 5 we present the effect of an increasing time horizon N on Ĵ0 (•).We modified the basic system setup by setting p 11 SM = 0.9 and λ 2 = N 100 .The latter modification is necessary otherwise the cost induced by the normalized AoI became negligible as the time horizon increased.By setting λ 2 = N 100 we had, a normalized AoI cost of ∆ = ∆ 100 , which was analogous to that of the basic scenario irrespectively to the time horizon N .Figure 5 depicts that an increment of N results in an increased Ĵ0 (•) for all policies.Furthermore, by calculating the relative difference of the threshold policy with respect to the delay policy with D = 90 we were observed that it achieved a constant reduction of 16% across all experiments.

VII. CONCLUSIONS
In this work, we address the problem of deriving an efficient policy for sensor probing in IoT networks with intermittent faults.We adopted a semantics-aware communications paradigm for the transmission of probes whereby the importance (semantics) of the probe is considered before its generation and transmission.We formulated the problem as POMDP and proved that the optimal policy is of a threshold type and used a computationally efficient stochastic approximation algorithm to derive the probing policy.Finally, the numerical results presented in this work exhibit a significant cost reduction when the derived probing policy is followed instead of a conventional delay based one.

APPENDIX B
In this appendix we show that Assumption 2 is equivalent to ξ 1 + ξ 2 ≤ φ s .For convenience we repeat here the definitions of ξ 1 , ξ 2 and φ s , where I S = {0, 1, . . ., 7} and in φ s we changed the order of summation.Now, by substituting to ξ 1 + ξ 2 ≤ φ s we get, which can be written as 7 i=0 p i t p i0 (1 − P g ) + p i4 (1 − P g ) − j∈Is\{0,4} p ij ≤ 0. Subsequently we express the transition probability from state with index i to state with index j as P [s t+1 = , where i 0 , i 1 , and i 2 represent respectively the states of the l M S link, the sensor and the l SM link at time t, while j 0 , j 1 , and j 2 represent respectively the states of the l M S link, the sensor and the l SM link at time t + 1.Thus expression, p i0 (1 − P g ) + p i4 (1 − P g ) − j∈Is\{0,4} p ij can be written as, p M S i 0 0 p S i 1 0 p SM i 2 0 (1 . Through simple algebraic manipulations the latter expression can be shown to be equal to p S i 1 0 p SM i 2 0 (2 − P g ) − 1 and, based on this result, ( 14) can be expressed as We prove Lemma 4 using induction.At the final stage, t = N , we have that J N (P, ∆) = λ 1 H(P h ) + λ 2 ∆ which is linear in H(P h ) and ∆.For stage t = N − 1, we have that, DRAFT February 3, 2022 status entropy.With similar arguments it can be shown that the probe action will also be optimal for states with higher normalized AoI than x t .As a result, at each decision stage t, the optimal policy will be threshold based with respect to V t .

2 ,Algorithm 1
. . ., K that converges to a local minimum, i.e., an approximation of the best single threshold policy for POMDP P. The SPSA algorithm picks a single random direction ω k along which the derivative is evaluated at each step k, i.e., ω H k and ω ∆ k are independently generated according to a Bernoulli distribution as presented in line 4 of Algorithm 1. Subsequently, in line 5 the algorithm generates threshold vectors θ + k and θ − k , which are bounded element-wise in the interval [0, 1], i.e., 0 and 1 in line 5 are column vectors whose elements are all zeros and ones respectively.θ ∆ k is also bounded in [0, 1] since we assumed a normalized value for the AoI, and, this is also true for θ H k since the maximum health status entropy occurs for P h t = [0.5, 0.5] which evaluates to 1.In line 6 the estimates Ĵ(θ + ) and Ĵ(θ − ) are computed by simulating M s times the POMDP P under the corresponding single threshold policy.Finally, the gradient is estimated in line 7, where represents an element-by-element DRAFT February 3, 2022 division, and θ k is updated in line 8. Since the SPSA algorithm converges to local optima it is necessary to try several initial conditions θ 0 .Policy gradient algorithm for probing control 1: Initialize threshold θ 0 = [θ H 0 , θ ∆ 0 ] and γ, A, η, β, ζ 2: for k = 1 to K do 3:
and by increasing p 01 SM from 0.1 to 0.2 in order to get the same range of τ f SM values as in Figure 3.By setting the matrices P M S and P S to the values presented above both the link l M S and the sensor S DRAFT February 3, 2022
Assumption 1.Let x t = [P t , ∆t ] and x + (•) be the dynamic program of P then for t = 1, . . ., N , it holds that J t (P + t , ∆t ) ≥ J t (P − t , ∆t ).
Proof : The proof of Lemma 2 is given in Appendix C.In Lemma 3 we state a similar property for the expected cost-to-go when the value of AoI increases.We omit the proof of Lemma 3 since it is intuitive and its proof follows a similar line of arguments as in Lemma 2. Lemma 3. Let ∆+ t and ∆− t be normalized AoI values such that ∆+ t ≥ ∆− t and J t (•) be the cost-to-go function in the dynamic program (12) then for t = 0, 1, . . ., N − 1, it holds that J t (P t , ∆+ t ) ≥ J t (P t , ∆− t ).