Distributed Channel Access for Control Over Known and Unknown Gilbert–Elliott Channels

In this article, we consider the distributed channel access problem for a system consisting of multiple control subsystems that close their loop over a shared wireless network with multiple channels subject to Markovian packet dropouts. Provided that an acknowledgement/ negative-acknowledgement feedback mechanism is in place, we show that this problem can be formulated as a Markov decision process. We then transform this problem to a form that enables distributed control-aware channel access. More specifically, we show that the control objective can be minimized without requiring information exchange between subsystems as long as the channel parameters are known. The objective is attained by adopting a priority-based deterministic channel access method and the stability of the system under the resulting scheme is analyzed. Next, we consider a practical scenario in which the channel parameters are unknown and adopt a learning method based on Bayesian inference, which is compatible with distributed implementation. We propose a heuristic posterior sampling algorithm, which is shown to significantly improve performance via simulations.

Distributed Channel Access for Control Over Known and Unknown Gilbert-Elliott Channels Tahmoores Farjam , Henk Wymeersch , and Themistoklis Charalambous Abstract-In this article, we consider the distributed channel access problem for a system consisting of multiple control subsystems that close their loop over a shared wireless network with multiple channels subject to Markovian packet dropouts.Provided that an acknowledgement/ negative-acknowledgement feedback mechanism is in place, we show that this problem can be formulated as a Markov decision process.We then transform this problem to a form that enables distributed control-aware channel access.More specifically, we show that the control objective can be minimized without requiring information exchange between subsystems as long as the channel parameters are known.The objective is attained by adopting a priority-based deterministic channel access method and the stability of the system under the resulting scheme is analyzed.Next, we consider a practical scenario in which the channel parameters are unknown and adopt a learning method based on Bayesian inference, which is compatible with distributed implementation.We propose a heuristic posterior sampling algorithm, which is shown to significantly improve performance via simulations.Tahmoores Farjam is with the Department of Electrical Engineering and Automation, School of Electrical Engineering, Aalto University, 02150 Espoo, Finland (e-mail: tahmoores.farjam@aalto.fi).

Index Terms-Bayesian
Henk Wymeersch is with the Department of Electrical Engineering, Chalmers University of Technology, 41296 Göteborg, Sweden (e-mail: henkw@chalmers.se).
Themistoklis Charalambous is with the Department of Electrical and Computer Engineering, School of Engineering, University of Cyprus, 1678 Nicosia, Cyprus, and also with the Department of Electrical Engineering and Automation, School of Electrical Engineering, Aalto University, 02150 Espoo, Finland (e-mail: themistoklis.charalambous@aalto.fi).
Color versions of one or more figures in this article are available at https://doi.org/10.1109/TAC.2023.3279902.
Digital Object Identifier 10.1109/TAC.2023.3279902computational capabilities at a lower cost.Wireless communication plays a key role in modern control environments since adopting wireless sensors leads to scalability, flexibility, and facilitates breaking new disruptive technologies into the market [2].The communication resources within these environments are often shared among various control loops and such systems are often referred to as wireless networked control systems (WNCSs).
Using wireless communication for information exchange in the control loops introduces several unique challenges that stem from nonnegligible transmission error probability.This leads to packet dropouts, which are typically modeled as an independent and identically distributed (i.i.d.) Bernoulli sequence.The impact of this phenomenon on the solution of the optimal estimation and linear quadratic Gaussian (LQG) control problem for a single loop has been investigated in seminal works [3] and [4], respectively.The i.i.d.assumption, however, corresponds to environments where path loss and small-scale fading are dominant.In industrial environments, large moving objects lead to shadow fading and burst error, which cause correlated packet dropouts [5], [6].This correlation can be approximated by modeling the communication channel as a time-homogeneous two-state Markov chain known as the Gilbert-Elliott (GE) model [7], [8].The impact of this type of channel on a single control loop has also been studied [9], [10], [11], [12].
Typically, WNCSs contain several control loops, hereon called subsystems, which communicate over a shared network to perform their individual tasks.The limited capacity of the network necessitates that only a subset of subsystems are allowed us to communicate within each time slot.Devising a policy for choosing a suitable subset of subsystems for achieving the desired objective given the communication constraints is known as the scheduling or channel access problem.These policies often require solving a complex optimization problem by a central entity in the network, which orchestrates channel access, thus impeding scalability.In this article, we consider the channel access problem over GE channels in the absence of a central coordinator in the network.We derive the stability conditions for our proposed distributed channel access method and also extend its application to scenarios where the underlying parameters of the GE channels are unknown.

A. Related Works
The seminal work [3] investigated the effect of i.i.d.packet dropouts on Kalman filtering, which showed that a critical dropout rate exists beyond which the estimation error covariance cannot be bounded.This paved the way for a plethora of works on sensor scheduling policies over ideal channels such that stability of the filter is preserved despite the intermittent arrival of data packets.For instance, the single, two, and multisensor scheduling problem subject to energy constraints were studied in [13], [14], and [15], [16], respectively, showing that the optimal schedule can be approximated by a periodic one.Sensor scheduling with possibility of i.i.d.packet dropouts during transmission has also been studied for bandwidth-limited systems [17], [18], [19] as well as systems with energy harvesting capabilities [20], [21], [22].In many practical scenarios, channel states, and consequently, packet dropouts are time-correlated, which motivates the use of GE channel model instead.The study of this model in WNCSs has been mainly concerned with stability [9], [11], [23], [24] and scheduling of a single sensor for remote estimation [25], [26].To the best of authors' knowledge, the only works that consider the closely related scenario of multiple GE channels are [27] and [28].
The sensor scheduling problem for remote estimation is in itself an interesting and prominent problem for applications, such as target tracking.Nevertheless, state estimation is also of paramount importance to feedback control.In the seminal work [4], the LQG problem for a single control loop subject to i.i.d.packet losses was considered and the certainty equivalence principle was shown to hold if instantaneous packet acknowledgements/negative-acknowledgements (ACK/NACKs) are available through an error-free feedback channel.Regarding the design of channel access policy, however, it is shown that the channel access decisions should also be independent of the control inputs for certainty equivalence to hold [29], [30].It has been shown that minimizing the LQG cost for WNCSs with certainty equivalent controller and i.i.d.channels requires solving a mixed-integer quadratic program [31].The high computational complexity of this problem has motivated the adoption of LQG-related cost for prioritizing data transmission in a computationally tractable manner [32], [33].
Distributed channel access methods are desirable for WNCSs since they offer higher security and allow for flexibility and scalability.Typically, due to computational intractability of the optimal scheduling solutions [14], [15], [16], [17], [18], [19], approximate solutions are proposed as a threshold policy [15], [17], [18], or periodic schedule [14], [16] which, in theory, can be successfully implemented with time-division multiple access or carrier sense multiple access schemes, respectively.Nevertheless, performance of such systems can deteriorate drastically in practice due to additional packet dropouts that happen because of the prolonged delay or collisions [34].This has motivated novel control-aware distributed channel access methods, such as try-once-discard (TOD) [35] and timer-based mechanism (TBCoIL) [36] for wired networks.Unlike TOD, TBCoIL is also capable of operating over wireless networks [37], and more importantly, it allows for learning the parameters of the communication channels for control-aware channel access.Applying reinforcement learning methods for learning the unknown system dynamics has a long history in the control community; see [38].Such methods have also been applied for near-optimal sensor scheduling over channels with known i.i.d.packet dropout rates [18], [39] or for learning the unknown dropout rates [37].In the closest settings to us, a centralized method for learning of the channel statistics and scheduling over GE channels have been proposed in [27], where the variations of channel states are assumed to be fully observable.

B. Main Contributions
In this article, we consider a WNCS consisting of multiple subsystems and multiple GE channels without a central scheduling unit for coordinating channel access.The limited communication resources are such that only a subset of sensors can utilize the shared network to communicate with their corresponding estimator.We first show that despite the partial observations of the channel states the optimal scheduling problem in the LQG sense can be formulated as an markov decision process (MDP).To the best of authors' knowledge, this is the first time that multiple partially observable GE channels have been considered in WNCSs and such a formulation is provided.The scenario closest to ours is investigated in [27], where the state variations of wireless links are assumed to be identical for all subsystems, thereby resulting in full observations.For distributed controlaware channel access, we then utilize the concept of cost of information loss (CoIL), originally introduced in [33], and show that the resulting priority measure can be utilized in TBCoIL.More specifically, the resulting priority measure for minimizing the stage cost can be calculated by each sensor individually and without requiring any explicit information exchange between them which enables distributed channel access with TBCoIL.We then derive the conditions under which implementing TB-CoIL is guaranteed to stabilize the system.The framework used for stability analysis is inspired by a work done on protocols with redundant data transmission [40], but our method significantly differs from the original work [40] and also seminal works [3], [4], [9].
Operation of TBCoIL assumes knowledge of the parameters of the underlying GE model.This can be restrictive in practice and thus we relax this assumption by adopting a Bayesian framework [41] for learning the channel parameters.This method enables us to reduce uncertainty in the channel parameters by incorporating information that is obtained from partial observations of the channel state variation.We then propose a heuristic posterior sampling algorithm that, in addition to computational tractability, allows us to address the exploration/exploitation dilemma in a distributed and control-aware manner through TBCoIL.

C. Organization and Notation
The rest of this article is organized as follows.In Section II, we provide the system model and the necessary preliminaries.In Section III, we provide the MDP formulation of the channel access problem and propose a distributed solution and establish the stability conditions.The adopted Bayesian framework for learning the GE channel parameters is described in Sections IV and the proposed learning algorithm is presented therein.In Section V, we numerically evaluate the performance of the proposed methods.Finally, Section VI concludes this article.
Notation: Z ≥0 (Z >0 ) denotes the set of nonnegative (positive) integers.The transpose, inverse, and trace of a square matrix X Fig. 1.Example of the WNCS layout with N subsystems competing to access a shared channel j.P i represents the plant of subsystem i, with S i , E i , and C i being its sensor, estimator, and controller, respectively.Note that the timer is embedded in the sensor block.
are denoted by X T , X −1 , and tr(X), respectively, while the notation X 0 (X 0) means that matrix X is positive semidefinite (definite).E{•} represents the expectation of its argument and P{•} denotes the probability of an event.f n (•) is the n-fold composition of f (•), with the convention that f 0 (X) = X.The Euclidean norm of a vector x is denoted by x and σ max (X) denotes the spectral radius of a matrix X.The n by n identity matrix is represented by I n . 1 n×p and 0 n×p present an all-one and all-zero n by p matrix, respectively.Finally, the cardinality of a set X is denoted by |X |.

II. SYSTEM MODEL AND PRELIMINARIES
The layout of the considered WNCS is depicted in Fig. 1.We consider multiple subsystems with decoupled dynamics share a multichannel wireless network for information exchange between their sensor and controller.The detailed model of the involved components is described in the following.

A. Local Processes and Measurements
Let N denote the index set of subsystems with |N | = N .Each subsystem i ∈ N is modeled by a linear time-invariant process as follows: where , respectively, are assumed to be uncorrelated zeromean Gaussian random variables with respective covariances X i,0 0, W i 0, and V i 0.
We assume that smart sensors with sufficient memory and computational capacity take the measurements (1b).This allows each sensor to run a local Kalman filter to compute the minimum mean square error estimate of the state, which is to be transmitted to the corresponding estimator.This setup is commonly used for remote estimation since it improves performance by resulting in a smaller error covariance at the estimator [42].Let Y i,k = {y i,0 , . . ., y i,k } be the history of measurements at smart sensor for subsystem i ∈ N and define as the a priori and a posteriori state estimates, respectively, and define } as the a priori and a posteriori error covariance at the smart sensor, respectively.All these are determined by the standard Kalman filter equations.We assume that for all i ∈ N the pair (A i , C i ) is observable, and the pair ) is controllable.As a result, the steady-state value of the a posteriori error covariance, i.e., P s i,k|k for k → ∞, exists and we denote it by P i [43,Ch. 5,p. 110].Since convergence to steady-state occurs at an exponential rate, we can safely assume that the local Kalman filter has already entered steady-state [18], [27], [39], [44].Therefore, at each time k, the generated data packet at the sensor contains xs i,k|k , which has error covariance P i .

B. Communication Channels
Let M denote the index set of the available channels with |M| = M and define δ i,j,k = 1, if i transmits xs i,k|k on channel j 0, otherwise. ( Since the wireless links are unreliable, transmission of sensor i on channel j at time k, i.e., δ i,j,k = 1, might be unsuccessful.We assume that each subsystem can listen to each of the M channels simultaneously.We further assume that the network protocol supports packet ACK/NACKs and that they are guaranteed to be received by the transmitter [19], [45].Let γ i,j,k ∈ {0, 1} correspond to this such that γ i,j,k = 1 if δ i,j,k = 1 and the data packet is successfully received; otherwise, γ i,j,k = 0.In addition, to represent whether the estimator i receives the data packet at k, we define We assume that one slot is sufficient for conveying all the information from the sensor to the estimator and at any time slot k, each subsystem occupies one channel at most, i.e., Furthermore, we impose the following constraint on the channel access decisions to ensure collision-free transmission: The effects of state quantization and transmission delays are considered negligible and are, thus, ignored henceforth.Fig. 2 depicts the two-state Markov chain corresponding to the GE channel model considered here.Let c i,j,k ∈ {G, B} denote the (possibly hidden) state of the wireless link at k, which can be either good or bad denoted by G and B, respectively.Then, data transmission over a link (δ i,j,k = 1) is successful (γ i,j,k = 1) if the link is in good state (c i,j,k = G), otherwise the data packet is dropped.The quality of each link is associated with the failure rate and recovery rate defined as respectively.
In case the channel state is not observed at a given time k, the sensor can still maintain a belief of the channel being G at the next time step.The evolution of the belief is given by When the channel state is not observed consecutively, the belief monotonically converges to the stationary probability of the channel state being G, which is given by

C. Control and Estimation
We choose the standard quadratic cost over the infinite horizon as the performance metric, which is given by where Q i 0 and R i 0 are weighting matrices of appropriate dimensions.We assume that the channel access decisions are independent of the control inputs, thus guaranteeing that the certainty equivalence principle holds [32].As it will become apparent in the following sections, our channel access policies indeed satisfy this assumption.Therefore, the optimal controller is linear and given by where L i,∞ is the optimal feedback gain determined by where Π i,∞ is the positive semidefinite solution of discrete-time algebraic Riccati equation By making the common assumption that the actuation links are perfect [21], [29], [30], [31], [32], [33] and based on the assumption that the pairs i ) are controllable and observable, respectively, the positive semidefinite solution of (12) always exists [46,Ch. 6].Let xi,k|k E{x i,k |I i,k } denote the a posteriori state estimate provided by the estimator at the controller side.The information pattern can be described as i.e., the successfully received estimates from the sensor and the past applied inputs.Furthermore, the estimator can infer the time elapsed since the most recent successful packet reception, which is defined by Then, the computations at the estimator can compactly be written as where } denotes the estimation error covariance at the estimator and the Lyapunov operator h i is defined as h i (X) A i XA T i + W i .Due to optimality of the certainty equivalent controller and separation of its design from the channel access decisions, the problem for obtaining the optimal channel access scheme for minimizing (9) can be formulated as Problem 1.
Problem 1: min where Δ k is a binary matrix that includes all the optimization variables at time k, i.e.,

D. Cost of Information Loss
The concept of CoIL was introduced in [33] to capture the impact of the loss of information of a subsystem on the cost of the entire system.Define E 0 i,k as the cost of subsystem i in case it does not receive any data at k; similarly, E 1 i,k is the cost when its data packet is successfully received.The CoIL for subsystem i at time k is defined as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
This concept can be utilized for solving the optimal channel access problem.Let F k ⊆ N denote the set of subsystems that transmit their data packet at k and F k N \ F k .Assuming perfect communication channels and one-step horizon, the expected value of the stage cost, denoted by J k , can be written as Since the first term in the last line of ( 20) is independent of the channel access decisions, minimizing the cost is equivalent to finding F k such that the last term is maximized.

E. Timer-Based Mechanism
Inspired by the celebrated result for relay selection in wireless cooperative networks [47], the TBCoIL was adopted and modified in [36] for providing distributed channel access in networked control systems (NCSs).Although the original mechanism was developed for networks with a single perfect shared channel, its application was later extended to WNCSs with multiple lossy channels [37].Suppose that each subsystem is equipped with M independent timers, i.e., a separate timer for each channel.At the beginning of each transmission slot k, subsystems set their timers and start the countdown to zero while being in listening mode.The timer values are given by where λ j is a constant specific to channel j ∈ M but is identical for all i, and the local cost, denoted by m i,j,k , is calculated individually for each channel.Consequently, a larger local cost corresponds to a smaller timer.For simplicity, we will assume that λ j is the same for all channels, i.e., λ j = λ for all j.Let {i * , j * } = arg min i,j {τ i,j,k } represent the indices of the smallest timer at k.As this timer reaches zero, subsystem i * switches to transmission mode and sends a flag packet on channel j * immediately, which informs the listening subsystems to stop their timers for j * and back off.Simultaneously, i * stops its running timers, i.e., withdraws from competition for the other channels, and transmits its data packet on j * .By assuming that the flag packet is always detected by all the listeners and that it has a very short duration, data transmission will be collision-free.Meanwhile, the remaining subsystems compete for the available channels until all M channels are allocated.As this time slot ends, the new timer values are determined based on the updated local cost (m i,j,k+1 ) and the entire procedure is repeated in the next slot.Fig. 3 demonstrates how this mechanism works for an illustrative case of two subsystems sharing a channel at k.The contention period can be adjusted by choosing λ as required by the communication protocol.Its value cannot be arbitrarily small though, because collision-free channel access requires that multiple timers do not expire within a shorter interval than the duration of the flag.This tradeoff is addressed by fine-tuning λ for specific configurations and based on the involved control and communication parameters [36], [47].Regarding the local cost m i,j,k , it can be any nonzero cost, which is to be defined according to a specific design objective.Defining it is a rather challenging task since it should be such that the resulting channel access decisions accomplish the prespecified objective, whilst each subsystem is able to evaluate it based on its local information.Recall that explicit information exchange between subsystems is impossible, and thus, distributed channel access requires m i,j,k to be based on local information.In the following sections, we will specify this cost in a way that implementing the TBCoIL achieves the channel access objective in a distributed manner.

III. DISTRIBUTED CHANNEL ACCESS OVER KNOWN GE CHANNELS
In this section, we first demonstrate that Problem 1 can be formulated as an MDP despite the partial observations of the channel state variations.Since the complexity of solving the MDP impedes tractability, we adopt the concept of CoIL to allow for solving the problem over a finite horizon in a distributed manner.The solution is obtained by implementing a specific timer setup in TBCoIL.Then, we derive the conditions that guarantee the stability of the system under the resulting channel access scheme.
For notational convenience and without loss of generality, we drop the subscript j and consider M = 1 when necessary and then provide the generalized results by reintroducing it.

A. MDP Formulation
Problem 1 can be simplified by only considering the components of J ∞ , which are influenced by the channel access decisions.
Problem 2: Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. where and Proof: From [48, Lemma 6.1, Ch. 8], it follows that for the setup considered here, (9) can be written as Since the first term is independent of the channel access decisions, the assertion follows.
In order to formulate Problem 2 as an MDP, we define two additional variables, which can be inferred from the information available at the sensors.Considering M = 1 hereafter, we define the holding time as which describes the time elapsed since i transmitted successfully on the channel.In addition, we define the observation time as the time since the most recent observation of the channel state by i, i.e., From the definitions, we have t o i,k ≤ t h i,k for all k.Recall that keeping track of the belief in (7) is crucial for sensors since channel states variations are not constantly observed.Thanks to the definition of ( 25) and (26), this belief can now be expressed in closed form as where the conditions indicate whether the most recently observed channel state was G or B. In case of a failed transmission, i.e., channel state being B, observation time is reset to zero, while holding time grows indicates that the last transmission attempt has been successful, i.e., the most recent observed channel state was G.
Problem 2 can be formulated as an MDP problem with an infinite time-averaged cost which can be described by a quadruple 1) The state space S: is the collection of all holding times and observation times, which can in turn determine the beliefs as per (27).Let a hyperstate be defined by T i,k (t h i,k , t o i,k ).Then, the state at k can be described by s k = (T 1,k , . . ., T N,k ), i.e., the collection of all hyperstates and, thus, the collection of all beliefs.
3) The transition Kernel P{•|•, •}: P{s k+1 |s k , a} is the probability of moving from state s k to s k+1 if the action a k is executed at k and it can be written as where Despite the possibly misleading appearance of ( 29), one should distinguish the transition Kernel from the states.When δ i,k = 1, the transition probability is determined by simply substituting the holding time and appearance time included in T i,k within (27), which yields a constant value between 0 and 1.By evaluating (29) for all i, one can obtain the transition probability Kernel from (28).4) The cost function R(•, •): From Proposition 1 and ( 16), we obtain where P i,k|k is given in (16) which depends on t i,k (14) which is inferred from the holding time, i.e., We define a policy π : S → A to be a mapping from the states to actions and denote by Π the set of all admissible policies.The goal of the MDP is to find the optimal policy, which minimizes the expectation of the time-averaged cost over the infinite horizon as This framework is applicable to the case of M>1 by considering the hyperstates for each wireless link.Thus, the state space is S = Z 2NM ≥0 and the action space and transition probabilities are also defined accordingly.In principle, after truncating S to a finite state space, solving (32) by dynamic programming techniques, e.g., using policy iteration or relative value iteration, is possible.However, even for the simplest case of M = 1, as the number of subsystems grows linearly, the number of states grows exponentially, and finding the optimal policy is shown to be PSPACE-hard [49].Although by choosing a finite horizon in (32), the problem becomes computationally feasible for approximate methods, a central network managers with access to information of all subsystems is required to solve the problem and allocate the channels accordingly.Hereafter, we will instead Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
consider the problem of minimizing the expected immediate cost at each time step, i.e., as follows.
As it will become apparent in the following section, the channel access policy for solving Problem 3, i.e., Δ k as defined in (18), can be determined in a distributed manner as required by the WNCS architecture.

B. Distributed Channel Access
In the beginning of each time slot k, the sensors decide whether to transmit within that slot based on their local information that is given by Note that it is implicitly assumed that the available information at sensor i contains the past control inputs of i.The sensor does not require additional communication from the controller and can infer such information from the knowledge of the control law (10) and utilizing the ACK/NACK signal to determine the state estimate at the controller side.
The transmission decisions and their outcomes are sufficient for inferring the holding time and the observation time, and therefore, ( 34) is sufficient for evaluating the belief at k.By utilizing this information, the CoIL for minimizing the stage cost can be derived in a similar way to (20).Let I s k ∪ i∈N I s i,k .From Proposition 1, it follows that J k = N i=1 tr(Γ i,∞ P i,k|k ), which is the same as the immediate cost in (30).As a result where (a) holds since the channel states evolve independently of the dynamics and for subsystem that do not transmit at k, i.e., i ∈ F k , δ i,k = 0. Since ( 34) is sufficient for inferring the holding time and the observation time, the sensors can compute the belief as per (27) which yields (b); finally, (c) is obtained by rearranging the terms.As a result, the optimal channel access problem for minimizing the stage cost is equivalent to finding F k such that the last summation in (35) is maximized.In accordance with the original definition, CoIL for subsystem i at k can be formulated as Since the sensors can keep track of the belief over all channels in case M>1, it readily follows that by reintroducing the corresponding subscript in (35), Problem 3 can be formulated as subject to (4), ( 5).
As mentioned in Section II-E, if local information is sufficient for determining m i,j,k in ( 21), the TBCoIL ensures that channel access is granted to the subsystems with the highest cost in a distributed manner while inherently satisfying constraints ( 4) and ( 5).Furthermore, each subsystem i only utilizes its local information for evaluating CoIL i,k and b i,j,k as per ( 36) and ( 27), respectively.Therefore, by letting m i,j,k = CoIL i,k b i,j,k we obtain Consequently, using these values in the TBCoIL determines Δ k in a distributed fashion.Furthermore, since the evolution of CoIL and belief are independent of the control actions, the certainty equivalence principle holds and the controller given in ( 10) is optimal for this channel access policy.Note that even in case of networks containing multiple subsystems with identical dynamics, this setup leads to collision-free channel access since p i,j and q i,j have Lebesgue measure zero.In other words, subsystems will almost surely have distinct beliefs and, thus, distinct timer values.Additionally, in case the network protocol requires bitwise arbitration for granting channel access, collision-free transmission can be guaranteed by implementing method, such as the one proposed in [50], where contention is based on dynamic and static identifiers.In such settings, the timer value in (38) can be utilized for assigning the dynamic identifiers, while the distinct static identifier is assigned as in [50].

C. Stability Analysis
We investigate the stability of the WNCSs in which timers are employed as per (38) by considering the Lyapunov mean square stability criterion.For ease of exposition, the subscript corresponding to the index of a subsystem is dropped in Definition 1 and Lemma 1.
Definition 1 (Lyapunov mean square stability [51]): The equilibrium solution is said to possess stability of the second moment if given ε > 0, there exists ξ(ε) such that Lemma 1: For the architecture considered in this work, (39) is equivalent to existence of ϕ satisfying 0 < ϕ < ε such that tr E{P k|k } < ϕ. (40) Proof: Let A L = A + BL ∞ and e k|k x k − xk|k .The state dynamics in (1a) can be rewritten as due to the fact that w k is zero-mean and independent of the state and its estimate.Furthermore, From the definition of the error covariance matrix at the estimator and the law of total expectation it follows that whose boundedness guarantees stability as per Definition 1. Due to the following property [52, Fact 8.12.28] we conclude that boundedness of E{P k|k } ensures that the second term in (42) is bounded.Additionally, thanks to the perfect communication link between the controller and actuators, boundedness of E{P k|k } guarantees that the feedback gain L ∞ is stabilizing [4].Since the certainty equivalence principle holds, the adopted controller ensures boundedness of the state estimate in steady state.Hence, the first term in (42) is bounded.Thus, existence of 0<ϕ<∞ such that tr(E{P k|k })<ϕ, ensures that ( 42) is bounded by some ε<∞, which is greater than ϕ due to nonnegativeness of all terms in (42), thus completing the proof.As a result of Lemma 1, the entire system is stable in the sense of Definition 1 if and only if there exists 0 < ϕ i < ∞ such that tr(E{P i,k|k }) < ϕ i for all i ∈ N .Note that the time elapsed since the last successful packet reception at the estimator, i.e., t i,k , is sufficient for computation of the error covariance as where 0 c=1 0. In the following, we take advantage of the ergodicity of the process t i,k to derive stability conditions.The following illustrative example demonstrates how the Markov chain modeling t i,k can be constructed and analyzed for two unstable subsystems sharing a single channel.
Example 1: Consider a WNCS that consists of two unstable subsystems and a single channel, i.e., N = 2 and M = 1, and the channel access is granted by utilizing the timer setup in (38).Although the channel access decisions are time-varying, the evolution of the system can be described by a Markov chain such that these deterministic decisions are only dependent on the state of the chain.Let S = Z 4 ≥0 denote the state space of a four-dimensional Markov chain, where each state and t o 2,k = m .Therefore, according to their respective definitions in ( 25) and ( 26), the state space can be reduced to all {(l, l ), (m, m )} ∈ Z 4 ≥0 such that l ≤ l and m ≤ m.Since knowledge of the holding time and observation time is sufficient for determining CoIL (36) and belief (27), the timer values and the resulting channel access decisions are state-dependent.We denote the decisions by where η = 0 and η = 1 correspond to Δ = [1 0] T and Δ = [0 1] T , respectively.As a result, the (possibly) nonzero transition probabilities are P {{(l + 1, l + 1), (0, 0)} | {(l, l ), (m, m )}, η} ξ 3 = ηb 2 (46c) where b i is the belief of subsystem i (27), which is also statedependent despite not being included in the notation for the ease of exposition.
In order to describe the transition probability matrix in a compact form, we use the following convention: where Ξ 3 = 0 l+1×1 ξ 3 I l+1 and Ξ 1 = ξ 1 1 l+1×1 .As a result, the transition probability matrix of the chain, denoted by P , can be formed as shown in (48) shown at the bottom of this page.Note that the state {0, 0} is transient and it only exists when initiating, and thus, we exclude it from the chain.Furthermore, the unreachable states are removed from the chain and the transition probability matrix is modified accordingly, in order to ensure that the resulting chain has a unique stationary distribution.More specifically, the communication constraints imply that both subsystems cannot transmit simultaneously.Consequently, {(l, l ), (m, m )} ∈ S which satisfy l = m, l = m , l = m, or l = m are unreachable.By excluding such states from the state space, the resulting chain has a single communicating class and it is irreducible, aperiodic, and positive recurrent.Hence, it has a unique stationary distribution denoted by π, which is found by solving where 1 is the all-ones column vector of appropriate dimensions [53,Ch. 1].With respect to the introduced notation in (47a), we can write π = [π {0,1} , π {0,2} , . ..],where the dimensions of π complies with the transition probability matrix P , and the invariant probability of holding times at each subsystem is found by solving (49).Since M = 1, t i,k = t h i,k and we define which are essential for the stability analysis.Remark 1: Regarding the properties of the discussed Markov chain, note that CoIL of unstable subsystems grows exponentially with respect to time elapsed since the last successful transmission.Since all subsystems in this work are assumed to be unstable, regardless of their specific characteristics and the parameters of the communication channels, a subsystem i with a large enough holding time will attempt to transmit until its packet goes through meaning that eventually T i,k = (0, 0).As a consequence, all states are accessible from each other (communicating), which ensures that the chain is irreducible.The chain is indeed aperiodic due to the possibility of packet dropouts, which means that all the nonzero transition probabilities are less than 1.Moreover, from the preceding discussion it follows that the waiting time for the chain to return to a state is almost surely finite meaning that the chain is positive recurrent.Hence, the chain has a unique stationary distribution.
The method described in Example 1 can readily be applied to larger WNCSs.In such settings, the state space is given by S = Z 2NM ≥0 and each recurrent state can possibly transition to N !/(N − M )! other states.Despite the larger state space, in principle, the transition probability matrix can be formed similarly.By removing the transient states as discussed, the resulting chain will have a unique limiting distribution and thus μ i (t) can be determined for all i ∈ N and t ≥ 0 accordingly.The following result demonstrates how the boundedness of tr(E{P k|k }) in Lemma 1 and μ i (t) are connected.
Theorem 1: The proposed channel access method stabilizes the WNCS in the sense of Definition 1 if the following condition holds for all i ∈ N : Proof: The chain is irreducible, aperiodic, and positive recurrent.Thus, the ergodic theorem allows to write the limit of the expected value of (44) as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply. Subsequently Similar to the proof in [40,Th. 1], by Cauchy's root test, this series is convergent if and from Gelfand's formula, we obtain Hence, if (52) holds for all i ∈ N , lim k→∞ E{P i,k|k } in ( 54) is bounded.Thus, 0 < ϕ i < ∞ exists such that tr(E{P i,k|k }) < ϕ i and the assertion follows.
As in Example 1, finding an analytical expression for μ i (t) to evaluate ( 52) is not always possible.Despite this, Theorem 1 can be utilized for examining stability in practice by utilizing the p-series convergence test as it will be shown in Section V-A.

IV. CHANNEL ACCESS OVER AN UNKNOWN GE CHANNEL
Implementing the TBCoIL according to (38) assumes complete knowledge of the transition probabilities of the GE model.However, this is a strong assumption and such information is not known a priori in practice.This assumption can be relaxed by adopting a Bayesian learning method, which maintains a probability distribution over the possible settings of each unknown parameter.We first address how the new channel state observation can be incorporated for updating the prior distribution over the unknown parameters.Then, we propose a heuristic posterior sampling algorithm for computational tractability in practice and exploit the learning outcome for providing channel access with TBCoIL.

A. Bayesian Framework
In Bayesian approach, an initial prior distribution is assumed over the unknown parameters, and the posterior distribution is updated using the Bayes' rule.The unknown channel parameters are within the interval [0, 1] and they can be viewed as random variables consisting of the number of successes in Bernoulli trials with unknown probability of success p and q.Here, we drop the subscripts for distinguishing each wireless link for ease of exposition.Beta distribution is the conjugate prior for Bernoulli distribution.Therefore, we assume that the prior distribution of the unknown transition probabilities of the GE model, i.e., p and q, follow the Beta distribution.Furthermore, they are independent which yields P{p, q; Φ} = P{p; φ 1 , φ 2 }P{q; φ 3 , φ 4 } ( where and B(•) denotes the Beta function.These prior distributions are parameterized by Φ = [φ 1 φ 2 φ 3 φ 4 ] ∈ Z 4 ≥0 , which we will refer to as posterior count.This choice of prior distribution highly facilitates the posterior update.More specifically, after new observations are made, the posterior update can easily be done by updating the posterior counts (φ 1 , φ 2 ) for p and (φ 3 , φ 4 ) for q.
Example 2: Consider that the channel state is G and Φ = [1, 2, 2, 3].Then, we observe that the channel stays G (G to G transition with probability 1 − p) for the first three time steps and then transitions to B (G to B with probability p).The updated posterior count is then simply calculated as Let o k ∈ {G, B, Z} denote the observation at k, where o k = Z represents no transmission attempt k.More specifically, if the sensor transmits at k, the actual channel state c k ∈ {G, B} is observed and o k = c k .Otherwise, o k = Z which corresponds to not observing the actual channel state.We denote the channel state history and observation history up to k by c k and o k , respectively.Then, the joint probability distribution of the channel state at k and the transition probabilities p and q given the observation history o k−1 is given by Multiple state histories can lead to the same posterior count.Consider the scenario in which there are a, b, c, and d number of G to B, G to G, B to G, and B to B state transitions, respectively.Regardless of the order in which the state transitions occur, we have where we used the fact that P{p, q} = 1.Let C(o k−1 ) denote all possible state histories based on the observation history o k−1 , which is given by Let the total number of state histories that lead to the same posterior count Φ be denoted by Ψ(Φ, C(o k−1 ), c k ), which we will refer to as the appearance count.The posterior distribution can be fully described by the appearance count associated with each posterior count and channel state, up to the normalization term P{o k−1 }.More specifically, by moving the normalization term to the left side of the equation, we can rewrite (60) as Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
When a new observation is obtained at k, the posterior at time k + 1 is updated recursively as follows: As a result of (64), the update has a simple form for each posterior count.Furthermore, the number of posterior counts remain unchanged whenever the channel state is observed, i.e., o k ∈ {G, B}.Otherwise, this number grows by a factor of less than or equal to two.
where Φ = [φ 1 φ 2 + 1 φ 3 φ 4 ] and it can readily be used as the prior for the next time step.If o k = Z, the same posterior update is given by iterating over both possibilities for the channel state at k, i.e.,

P{G, p, q|o
which can increase the number of posterior counts.Fig. 4 illustrates how the posterior counts and their respective appearance counts are updated with respect to the obtained observation.

B. Online Learning Through the TBCoIL
The aforementioned method allows for incorporating the uncertainty in the transition probabilities in the decision making process.Due to the lack of a priori knowledge of the underlying channel parameters, the belief for implementing the setup in (38) cannot be directly evaluated as per (27).Nonetheless, in principle, the belief can be inferred from the joint distribution of the channel state and its parameters in the aforementioned framework.In practice, however, this method is computationally infeasible since whenever the sensor does not transmit over a link, the number of posteriors for that link grows and inevitably goes to infinity over time.
To circumvent the curse of dimensionality, we propose a heuristic method by combining the idea of approximate belief monitoring [54] and the posterior sampling algorithm proposed in [55].In essence, after each update, only K posterior counts are kept, which are drawn randomly with respect to the respective appearance counts.Algorithm 1 presents how at any time k, sensor i evaluates its belief for channel j which is denoted by b L i,j,k .This belief is incorporated in TBCoIL for providing channel access as We define ζ G {Φ, Ψ, P } as the posterior count Φ with appearance count Ψ for being in state G, which has the probability P .In case of successful transmission at k − 1, the posterior for computation of belief at k is obtained by considering the possible state transition from ζ G , which could be to G, denoted by ζ G2G , or to B, denoted by ζ G2B .The transition probabilities depend on p, which is the mean of the beta distribution associated with the posterior count, i.e., p = φ 1 /(φ 1 + φ 2 ).Similarly, ζ B {Φ, Ψ, P } denotes the parameters corresponding to B state, which can transition to G or B, i.e., ζ B2G and ζ B2B , respectively, with q = φ 3 /(φ 3 + φ 4 ).The updated posteriors are formed in Line 9, where ∪ denotes merging the identical posterior counts by summing the respective appearance count and P .Then, K number of posterior counts are chosen randomly such that the probability of a posterior count being selected is proportional to the associated appearance count.Finally, P s are normalized for the remaining posterior counts and the learned belief is determined by summing the probability of all the posteriors of being in G, as in Line 13.
Remark 2: Typically, the initial probability distribution over the unknown parameters is assumed to be uniform and, thus, Φ = [1, 1, 1, 1] when initiating.To ensure that implementing (67) guarantees collision-free channel access even in homogeneous WNCSs, the initial posterior count can be set to Φ is chosen randomly by subsystems for each link.This ensures Algorithm 1: Posterior Sampling of Sensor i for Channel j at k. that b L i,j,k is Lebesgue measure zero and by choosing α 1 the impact of the biased priors becomes negligible.
Remark 3: When the idea of approximate belief monitoring is applied for a single agent interacting with an unknown environment, accurate convergence is guaranteed since all uncertainty is represented explicitly [41], [56].Proving the convergence of Algorithm 1 is, however, a challenging open problem.In addition to the unknown channel parameters, the decisions and, thus, observations are determined by the outcome of implementing the TBCoIL, which is highly influenced by the time-varying CoIL.Although more unstable subsystems observe the channel states more frequently, all subsystems eventually make sufficient observations due to the exponential growth of CoIL.Therefore, convergence can be conjectured which is confirmed by the simulations in Section V.

V. NUMERICAL RESULTS
In this section, we first present a method for examining the stability of the system in Example 1. Next, the effect of channel access decisions on the performance of the learning algorithm is demonstrated.Finally, we examine the performance of the proposed timer-setups for known and unknown GE channel parameters.The following results are obtained for Q = I 2 and

A. Stability Evaluation
This section presents a numerical approach for examining the stability of two identical subsystems sharing a channel presented in Example 1.Although the discussed Markov chain has countably-infinite state space, we first assume that the maximal interval between two successful transmissions is finite.This will enable us to determine the stationary distribution analytically and conjecture the convergence of the infinite series in (54), and consequently, whether the condition in ( 52) holds.To this end, we consider the truncated chain with a finite state space with 0 ≤ l ≤ l and 0 ≤ m ≤ m.This corresponds to assuming a maximal interval of l for successful transmission of Subsystem 1 and m for Subsystem 2. Let P denote the transition probability matrix of the new chain, which is obtained by truncating P in (48).Since P is row stochastic, irreducible, and aperiodic, the stationary distribution can be obtained by [57] where I, D, and 1 are the identity matrix, all-one-matrix, and the all-one column vector of appropriate dimensions.To examine whether the series on the right-hand side of ( 54) is convergent, we utilize the p-series convergence test.Hence, if p > 1 and then (55) holds, which guarantees stability.By using the numerical values obtained from (68) for a finite horizon, one can examine the behavior of (69) and conjecture whether the condition in Theorem 1 holds.Fig. 5 illustrates the values of μ i (t) A i 2t and β/t p as a function of t given that p = 2 and β = 100 with the system matrix A i = 1.2I 2 .Furthermore, the GE transition probabilities for Subsystem 1 and Subsystem 2 are assumed to be p 1 = 0.25, q 1 = 0.80, and p 2 = 0.35, q 2 = 0.70, respectively.As the results indicate, μ i (t) A i 2t monotonically decreases as t increases for t ≥ 2 for both subsystems.Therefore, since the convergent series lim   recovery rates are reduced to q 1 = 0.20 and q 1 = 0.10, however, μ i A i 2t becomes an increasing function of t, as depicted in Fig. 6.This indicates that the left-hand side of (69) is not necessarily bounded and, thus, stability of the system cannot be guaranteed.

B. Unknown GE Model Parameters
To demonstrate the impact of the dynamics of subsystems on the outcome of the learning algorithm, we first consider the setup in the previous section, where two identical subsystems with A = 1.2I 2 compete for transmitting over one channel.Fig. 7 illustrates how their learned belief evolves over time compared with the actual belief (7) when the setup in (67) is utilized.Due to the identical dynamics and, consequently, identical growth rate for CoIL, both subsystems share the channel fairly, and both learn the belief with high accuracy.However, when the dynamics of Subsystem 1 change to A = 1.05I 2 , Subsystem 2 is expected to transmit more frequently due to its larger eigenvalue, i.e., faster increase of CoIL.Consequently, Subsystem 2 observes the channel states more frequently, leading to higher accuracy of its learned belief, as depicted in Fig. 8.

C. Performance Evaluation
To evaluate the performance of the proposed setups for solving Problem 3, we consider WNCSs with N ∈ {8, 16, 24, 32} identical subsystems with A = 1.2I 2 and M ∈ {6, 12, 18, 24} channels.The channel parameters are chosen randomly while Fig. 8. Accuracy of the learned belief for two different subsystems sharing a single channel.Subsystem 1 (top) is less unstable (A 1 = 1.05I 2 ) than Subsystem 2 (bottom) with system matrix A 2 = 1.2I 2 .Fig. 9. Reduction in the average quadratic cost (9) achieved by using timer-setups with the known belief b i,j,k (38), learned belief b L i,j,k (67), learned belief proposed in [1] denoted by b L,old i,j,k , stationary belief b i,j,∞ and UCB-V [58] as proposed in [37].The number of available channels is M = N/2.A setup where the channels are selected randomly by utilizing CoIL i,k as the local measure in (21) is chosen as the benchmark.
satisfying 0.2≤p i,j , q i,j ≤0.5 for all i and j.As the benchmark, we consider a scenario in which a central coordinator prioritizes subsystems with respect to CoIL only and assigns a random channel to each of the M subsystems with the largest CoIL.As expected, with a priori knowledge of the transition probabilities of the GE model, the setup in (38) with the known belief b i,j,k significantly reduces the incurred cost, as depicted in Fig. 9.This is in sharp contrast with adopting the stationary belief b i,j,∞ (8), which leads to the worst performance.Without any prior knowledge of the channel parameters, utilizing the learned belief from Algorithm 1 in setup (67) results in up to 25% lower cost.This setup outperforms the performance of the algorithm proposed in [1], which is represented by b i,j,∞ .To demonstrate the significance of tailoring a learning method for the GE channel model, we compare the results with the timer setup proposed in [37] where UCB-V algorithm [58] is adopted for providing channel access over unknown i.i.d.channels.For smaller networks, this model mismatch leads to a considerable increase in cost.As the size of the WNCS grows, the number of unobserved channel states increases, which leads to more exploration of the learning method rather than exploitation.Nevertheless, even in such settings, Algorithm 1 leads to better performance in terms of reducing the cost (9).The same trend can be observed Fig. 10.Reduction in the average quadratic cost (9) achieved by using timer-setups with the known belief b i,j,k (38), learned belief b L i,j,k (67), learned belief proposed in [1] denoted by b L,old i,j,k , stationary belief b i,j,∞ and UCB-V [58] as proposed in [37].The number of available channels is M = N/4.A setup where the channels are selected randomly by utilizing CoIL i,k as the local measure in ( 21) is chosen as the benchmark.
in heterogeneous WNCSs as illustrated in Fig. 10, where the dynamics of half the subsystems are changed to A = 1.05I 2 .

A. Conclusion
We presented a novel method for providing distributed channel access in WNCSs with correlated packet dropouts.We formulated the optimal channel access problem for minimizing the infinite-horizon LGQ cost as an MDP despite the partial observability of the channel state variations.We then adopted the concept of CoIL for circumventing the computational complexity of the MDP and showed that its computation requires no information exchange between subsystems.Based on this, we proposed a timer setup for providing distributed channel access by TBCoIL and derived the conditions under which implementing this mechanism ensures mean square stability of the system.We further investigated the scenario in which the underlying channel parameters are not known a priori and adopted a Bayesian framework for incorporating the information obtained by channel state observations in estimating the channel quality.We then proposed a computationally efficient heuristic algorithm, which allows for control-aware exploration/exploitation via TBCoIL.The simulations showed that this setup leads to significant improvement compared with allocating the resources with respect to control performance only.

B. Future Directions
Interesting future research directions include considering the scenario in which the channel model varies over time and devising learning methods, which are able to detect this variation and adapt accordingly.Another challenging open question is how can the stability framework be modified such that it is applicable to WNCSs containing both stable and unstable subsystems.
inference, distributed channel access, Gilbert-Elliott channel, online learning, wireless networked control systems (WNCSs).I. INTRODUCTION R ECENT technological advancements have enabled mass production of low-power wireless sensors with high Manuscript received 27 July 2022; revised 28 December 2022; accepted 8 May 2023.Date of publication 25 May 2023; date of current version 5 December 2023.The work of Tahmoores Farjam was supported by the Academy of Finland under Grant 13346070.The work of Themistoklis Charalambous was supported in part by the Academy of Finland under Grant 317726, and in part by the European Research Council (ERC) under the European Union's Horizon 2022 research and innovation programme under Grant 101044629.An earlier version of this paper was presented in part at the European Control Conference, 2019 [DOI: 10.23919/ECC.2019.8796177].Recommended by Associate Editor F. Pasqualetti.(Corresponding author: Themistoklis Charalambous.)

Fig. 2 .
Fig. 2. Two-state Markov chain of the GE channel model.

Fig. 3 .
Fig. 3. Two subsystems sharing a single channel via timers at k. Subsystem 1 has a smaller timer (τ 1,k <τ 2,k ) and claims the channel.

Fig. 4 .
Fig.4.Graphical representation of the update procedure when the channel state is not observed at k and it is G at k + 1.The contents of each rectangle are the channel state, posterior count, and appearance count, respectively.Note that since o k = Z, the number of possible posterior counts increases.

Fig. 7 .
Fig. 7. Accuracy of the learned belief for two identical subsystems sharing a single channel.