Strategic Age of Information Aware Interaction Over a Relay Channel

Age of Information (AoI) is a metric often used to represent the freshness of the information exchanged between a sensing source and a receiver. We consider a system where these two nodes are connected through an error-prone timeslotted channel, and a relay node is also present to assist the transmission. We consider both the sensor and the relay as intermittently and independently active nodes, whose activity rate may be adjusted, resulting in different levels of freshness and corresponding energy costs. To this end, the activity pattern can either follow a Bernoulli random process or a periodic duty cycle with adjustable duration. After computing the expected AoI and the complete Peak Age of Information (PAoI) distribution for both cases, we consider a fully distributed game theoretic duty cycle optimization, in which the two nodes independently tune their own activity rate, finding a balance between freshness and cost. The equilibrium of the resulting game is found to be both efficient from the perspective of the resulting performance and computationally lightweight for a distributed robust control implementation.


Strategic Age of Information Aware Interaction
Over a Relay Channel Federico Chiariotti , Member, IEEE, and Leonardo Badia , Senior Member, IEEE Abstract-Age of Information (AoI) is a metric often used to represent the freshness of the information exchanged between a sensing source and a receiver.We consider a system where these two nodes are connected through an error-prone timeslotted channel, and a relay node is also present to assist the transmission.We consider both the sensor and the relay as intermittently and independently active nodes, whose activity rate may be adjusted, resulting in different levels of freshness and corresponding energy costs.To this end, the activity pattern can either follow a Bernoulli random process or a periodic duty cycle with adjustable duration.After computing the expected AoI and the complete Peak Age of Information (PAoI) distribution for both cases, we consider a fully distributed game theoretic duty cycle optimization, in which the two nodes independently tune their own activity rate, finding a balance between freshness and cost.The equilibrium of the resulting game is found to be both efficient from the perspective of the resulting performance and computationally lightweight for a distributed robust control implementation.
Index Terms-Age of information, data acquisition, modeling, robust communications, relay.

I. INTRODUCTION
O VER the past decade, the Internet of Things (IoT)   has begun to slowly integrate with many aspects of everyday life, providing novel services and applications that give citizens, companies, and public administrations an upto-date awareness of the environment in cities, factories, and homes [2].The main requirement for these applications is not throughput, or even latency for individual transmissions, but rather freshness: recent data from the sensors should be available to the monitoring application.Age of Information (AoI) is a performance metric that aims to evaluate the freshness of the data updates coming from one or more remote sensing sources [3].
Compared to the conventional memoryless metrics such as latency/delay, AoI is better able to characterize not only the Quality of Service (QoS) of the communication system, but also the robustness of the associated network control, and can be connected to the state estimation error as related to the system outage [4], [5].AoI and related metrics have seen significant interest from the research community over the past decade, and analytical and experimental studies exist for many schemes and communication technologies [6].
In order to provide reliable performance, several works in the literature have considered coding and Automatic Repeat Request (ARQ) strategies to minimize the AoI.The use of repetition in time [7] or over orthogonal communication channels [8] can provide significant reductions of the average and worst-case AoI, but comes at the cost of a higher load on the communication channel.Adding redundancy is also opposite to what done by AoI minimization strategies that involve dropping outdated packets, instead [9].For what concerns the performance improvement that ARQ can provide in terms of AoI [10], [11], it is found to be often better than with error correction coding, as individual updates can be retransmitted only when needed, but requires a feedback channel, which is often costly or even unavailable for low-power IoT nodes.
Energy-limited IoT nodes often have other constraints to consider, such as limited energy or low-bitrate communication channels: in this case, strategies to minimize AoI are often more complex [12] to take the additional requirements into account.Redundant communication schemes increase energy consumption, as the additional transmitted data requires power, so that the consideration of transmission costs is crucial.In general, any redundant solution that can relax either communication or other constraints can be beneficial to the information freshness: as an example, it is possible to consider redundancy in terms of energy [13], using a backup energy source for energy-harvesting nodes.
In this work, we consider a scenario, in which a sensor is aided by a relay node [14], [15], [16]: transmissions from the sensor may fail, leading to a higher AoI, but the relay can recover from these failures by retransmitting the message, ensuring that it is delivered to the receiver.We analyze the average AoI in this scenario, providing a closed-form solution, and define a game theoretic optimization, in which transmissions from the node and relay have a cost: the Nash equilibrium (NE) of the game between the sensor and relay represents an easily computable, Pareto efficient solution to the problem of optimizing their activity rate [17].
The contributions of the paper are the following: scenario in which transmissions follow a Bernoulli process and a periodic activation scenario; • We provide a general game theoretic optimization model, which can be run distributedly by the two nodes without any explicit coordination; • We define computationally efficient strategies to approximately reach the optimum operation point, which are within the capabilities of even simple devices; • We verify the analysis by simulation, and draw some design insights from the outcomes in different scenarios.To the best of our knowledge, this is the first work to consider the distributed optimization of such a relay scenario, including both expected AoI and PAoI violation probability and providing exact theoretical guarantees for performance.Previous works on AoI in relay-aided systems [15], [16], [18] either compute the age without considering the optimization of the network or optimize the scenario in a centralized fashion, using heuristics and approximate formulations of the age.
The rest of this paper is organized as follows.In Sec.II we review related work.The basic communication system model is defined in Sec.III.Sec.IV presents the analysis of the PAoI distribution, while we derive the expected AoI in Sec.V.The game theoretic approach to implement a distributed strategic management of the system to increase its resilience is developed in Sec.VI.Sec.VII then presents the numerical results, and Sec.VIII concludes the paper and presents some possible avenues of future work.

II. RELATED WORK
AoI was first defined as a performance metric for real-time applications in vehicular networks [19].The general nature of the metric has led to an explosion of interest from the research and industrial communities, with analyses in disparate scenarios [2] using various theoretical models [6].However, most of the scenarios investigated in the literature generally involve single links, or sometimes multiple access with competing nodes [20].
Cooperative systems such as vehicular networks, which pursue the shared objective of safe, efficient automated driving by disseminating information in real time over wireless links, are still relatively unexplored in the relevant literature [21].One possible reason for this is that the coordination of multiple nodes towards AoI minimization is often seen as requiring significant signaling to acquire a global network characterization and perform a stateful optimization.However, IoT scenarios require timely status updates and low energy consumption, which would both be negatively affected by the signaling.
As a possible solution in this sense, we investigate distributed policies that do not require signaling, and optimize the transmission parameters distributedly and robustly [4] by adopting a game theoretical perspective [22].The nodes can reach a locally efficient solution, i.e., the Nash Equilibrium (NE), without resorting to explicit signaling but just leveraging the common knowledge of each node's rationality.Hence, our approach can be seen as a robust distributed solution.
We remark that game theory was already used in combination with AoI-related objectives of individual nodes.
For example, a medium access game was considered in [23], in which the channel follows a simple collision model.Similar investigations, but with more advanced access mechanisms and models, are also developed in [24] for an interference channel and [17] for a collision channel with capture effect.A more detailed analysis, which also considers irregular repetition slotted ALOHA, is presented in [7].
All these works assume that nodes are competing for access, and there is no mutual assistance for relaying.Some points about the relationship of AoI to redundancy or retransmissions can be found in works such as [10] and [11].Even cooperative scenarios can require trade-offs between individual and system-level objectives: in [25], two sources, both able to provide equivalent updates at the receiver's, were considered.A game theoretic model was proposed to capture the inherent laziness of strategic agents, which would refrain from updating (and pay the associated cost) if they expect the other to do it.This trend is also present in our analysis: as we will see, the source, knowing that the relay improves its reliability, can decrease its activity and act lazily.
Additionally, most scenarios with multiple nodes assume a symmetric scenario.In case of symmetry, game theory may lead to multiple NEs, not all of them being efficient.Thus, a very relevant difference in our analysis is that the scenario is instead asymmetric, as the relay can only intervene after the source [26].As we show in the following, this can significantly strengthen the analytical, as we are able to prove the uniqueness of the NE (up to quantization in the discrete case, which may actually lead to two equilibria), and convergence to it is easily achieved through fictitious play [27].
For what concerns the network setup, there are some papers exploring relay channels from an AoI standpoint, even though the perspective is never game theoretic and considers the relay as acting with stateful information and opportunistically (or, it would be best to say, purely driven by altruism and bearing no cost).For example, in [28], a relay is considered to act as an intermediary between a number of sensor and destination nodes, and the problem is to minimize age of information through an optimal scheduling policy.However, there is no direct communication between source and destination nodes, and the relay node is the only resource manager.In [29], a similar scenario is considered but the communication is bidirectional and the relay is two-way.The use of short blocklength codes with multiple parallel relays was also explored in recent work [18], [30], potentially optimizing the packet length, the retransmission procedures over both links, and the number of selected relays to control the age-energy trade-off.
Conversely, [14] considers two network models, one of which is the same as ours, i.e., a three-node relay network with a direct channel and a path through the relay (the other being a symmetric two relay network with two paths each going through one of the relays), and the relay-aided transmission incurs AoI increased by 1 time slot.They also focus on static scheduling policies, i.e., the transmission probabilities of nodes, and derive AoI expressions through Markovian jump linear systems similar to ours.However, their focus is on investigating and directly deriving AoI-optimal scheduling policies, without a game theoretic investigation.In this sense, source and relay nodes are fully coordinated entities.Furthermore, they only consider the expected AoI, with no consideration for worst-case performance.
The same model is also considered in [31], in which the authors additionally compare the case of pure time-division multiple access, where transmissions must be sequential between the source and the relay, thus resulting in the AoI of relay-aided transmissions being higher, and the extension to simultaneous transmissions via non-orthogonal medium access.Also, [15] considers a similar relay channel model, but their focus is on an opportunistic relay (not controlled by a rational player) and the only optimization takes place in the choice of the update generation probability p by the source, which is chosen to minimize the AoI.This very same model is also analyzed in [16] from the perspective of a stateful stochastic optimization of the system, with nodes acting in a coordinated fashion.
All these investigations prove that, while communication through a relay has the potential to improve information freshness, it is unclear how to implement it distributedly and without signaling when both nodes also incur energy costs for their activity.On the other hand, our game theoretic approach avoids the need for idealization and takes the analysis in a more realistic context of IoT nodes with distributed management [4].

III. SYSTEM MODEL
We consider an IoT sensor and a gateway exchanging status updates over a time-slotted wireless channel.The sensors follow an update-at-will model, in which fresh information is always available to the sensor, but there are strict energy constraints, which limit the frequency of transmissions.As the sensor operates in a duty cycle mode, we consider two modes for packet generation: (a) Random reporting: the sensor follows a Bernoulli process, waking up and transmitting a packet with probability p; (b) Periodic reporting: the sensor wakes up once every T slots and reports the current value.As commonly done in the literature [6], [17], we neglect the transmission delay in the information exchange, so transmitted packets are generated at the beginning of a slot and received in the same time slot, with no delay.However, introducing a latency of 1 slot for each transmission would not change the results, simply increasing the AoI by a fixed value of 1.All considerations in the following sections remain valid for a system with a transmission delay, and our choice of neglecting it is entirely for the sake of readability and mathematical simplicity.However, we take fluctuations of the wireless channel into account: as fading and interference may prevent the receiver from decoding the packet, we model the channel as a Packet Erasure Channel (PEC) with erasure probability f , which is known to both the sensor and receiver.
In the following, we will denote vectors in bold, e.g., as x, and their elements using a subscript, e.g., x i .Random variables will be represented by capital letters, e.g., X, and their Probability Mass Function (PMF) or Probability Density Function (PDF) will be denoted as p X (x).The corresponding Cumulative Density Function (CDF) will be denoted as P X (x).
We consider the presence of a relay node [32], as shown in Fig. 1, which is known to improve the resilience of the transmission and achieve lower delays.The relay is randomly and independently active with probability b due to energy limitations: as coordinating with the gateway would require additional signaling, the relay does not know the status or AoI of the sensor.As a relay might be required to serve multiple sensors and might not even know the activity pattern of the sensor without explicit synchronization, it will be randomly active both when the sensor is randomly active and when its reporting is periodic.In the latter case, we will also consider a fully synchronized scheme, in which the relay is active once every M sensor periods.When it is active, the relay listens to the channel used by the sensor to transmit, decodes the transmitted packet if the sensor is also active, and relays it to the receiver via a reliable out-of-band exchange.This way, the relay can receive the packet in one time slot and retransmit it in the next one in a decode-and-forward fashion; naturally, this introduces an additional delay of one time slot.In this case, the latency of the transmitted packet is 1 instead of 0. This model can also represent a random repetition code [7] with Maximal Ratio Combining (MRC) decoding.The relay needs to be able to receive a new packet from the sensor while relaying the previous one to the gateway, but as the reception and transmission are over different frequency bands, this does not require it to have full-duplex capabilities.
In-band relaying systems have already been prototyped in practical systems such as LoRaWAN [33], [34], although using a different spreading factor for the second hop would have the same effect as out-of-band communication.Several relaying IoT solutions, including both in-band and out-of-band relays, have been proposed in the literature [35] over a wide variety of technologies.In particular, the use of drones as relays has been explored [36] widely and over different applications; drones almost always operate out-of-band, as they use a different technology to connect to the gateway.
Note that the retransmission by the relay is assumed to be always successful, since it takes place on an orthogonal reliable communication channel, but it would be trivial to include independent and identically distributed (i.i.d.) failures on this side too, by simply rescaling the value of b.In other words, if the relay has an i.i.d.failure rate h, we can effectively replace b with bh in the following.
If the i-th packet from the sensor is generated at time G i , we can consider the reception time R i , distinguishing three cases.The packet is either received directly from the sensor, in which case R i = G i , erased and recovered through the relay, with R i = G i + 1, or erased and not relayed.Conventionally, a packet that is never received has infinite latency, and its reception instant is +∞.The conditional distribution of R i , for a given value of G i , is then: Naturally, this does not hold in the fully synchronized case, as the PMF depends on the relay's random activation.In that case, the PMF is simple, as the activation of the relay is deterministic.The latency for packet i is then given by T i = R i − G i .We can also define set R(t), containing the indices of the packets that have been correctly received by time t: Thus, the value of AoI at the receiver in time slot t is [3]: which implies a sawtooth pattern for the evolution of AoI [37], [38], decreasing to T i after packet i is delivered and linearly growing until the next packet reception.We can define the expected AoI, ∆, as: Finally, the PAoI Ψ is the AoI at the instant of a packet reception: PAoI is defined only for successfully delivered packets.In the following, we will consider both the average AoI and the PAoI threshold violation probability as system Key Performance Indicators (KPIs).
The other KPI that we consider is energy: we assume that both the sensor and relay pay a fixed energy cost, represented by constant c for the sensor and k for the relay, every time they activate.In the following, we will study the trade-off between the timing performance, which improves if the nodes are active more often, and the expected energy consumption per slot.
We assume that both the sensor and the relay node are controlled by strategic agents operating with the aim to minimize the AoI at the receiver's side.At the same time, we also consider activity costs for both the sensor and the relay.We leverage and expand analytical results for AoI and PAoI in the presence of independent random transmissions over a slotted channel.From a performance evaluation perspective, we discuss how our problem can be framed as a potential game [39], whose NE is found to be an efficient trade-off between achieving fresh information without incurring excessive costs.At the same time, the strategic interaction between the two agents can take place without any explicit exchange of control information, which makes our approach particularly suitable for distributed robust implementations.
We also highlight that our system requires no feedback channel towards the sensor or the relay: as transmission is random, and the relay operates independently, the sensor can become active in a given slot, obtain a measurement, transmit its value, and return to sleep mode, without needing to remain awake and receive feedback.As reception may require almost as much power as active transmission [40] for lowpower sensors, this can significantly extend the lifetime of the sensor's battery with respect to a scheme relying on feedback or explicit coordination.

IV. PEAK AGE OF INFORMATION
We can now derive the distribution of the PAoI Ψ.As above, we first consider the random reporting case, then extend the results to periodic reporting.In the following, for notational convenience, we will set q = 1−b, representing the probability that the backup is inactive, and σ = 1 − f q, representing the overall success probability of a packet.

A. Random Reporting
Let us consider packet i, which is successfully received.The PAoI Ψ i represents the time between the generation of the latest received packet before i and the reception of packet i.Following the definition in (5), the PAoI is equivalent to We note that the inter-packet interval Θ i = G i −max j∈R(Gi)\{i} G j corresponds to the AoI ∆(G i ) in the case in which packet i is not immediately transmitted.We then give the PMFs of the latency T i (considering only successfully received packets, as the PAoI is undefined for lost packets) and of the inter-packet interval Θ i , noting that the two are statistically independent, in the following Lemmas.
Lemma 1: Under a successful transmission of packet i, the PMF of the latency T i is Proof: The PMF is derived directly from (1), under the condition that the packet is successfully received (either directly or through the relay), i.e., that R i is not infinite.
Lemma 2: The PMF of the inter-packet interval Θ i is given by: (7) Proof: The inter-packet interval follows a simple geometric distribution: in order for the closest previous packet to be generated θ i slots before, there must be no successful packets in between, and the probability is the same for all packets.
Theorem 1: The PMF of the PAoI Ψ i for packet i, received successfully either directly from the sensor or through the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
relay, is: (8) Proof: We know that Ψ i = Θ i + T i , and Θ i ≥ 1.Consequently, the PAoI can never be 0. In the case ψ i = 1, we need packet i to be transmitted with latency 0 and Θ i to be 1.In all other cases, since the inter-packet interval and the latency are independent, we can directly apply a convolution to solve the case in which ψ i > 2: The actual value of the PMF is then simply derived by substituting the results from Lemma 1 and Lemma 2 into (9).
Corollary 1: The CDF of the PAoI Ψ i for packet i, received successfully either directly from the sensor or through the relay, is given by: (10) Proof: The CDF can be promptly computed from the PMF in Theorem 1.

B. Periodic Reporting
In the periodic reporting case, the PAoI is simpler, and it depends on the number of failed transmissions between two subsequent successes.We have to consider the case in which the packet is recovered through the relay, as well as the one in which the transmission from the sensor is successful.
Theorem 2: The PMF of the PAoI Ψ i in a periodic reporting system is: where N + = N \ {0}.
Proof: As we know that the success probability is σ, the probability of having n frames between two subsequent successful transmissions follows a geometric distribution with parameter σ.Since the reception of the packet may be either direct or through the relay, we can perform the convolution between the inter-packet interval distribution (which is non-zero only if the argument is a multiple of the period T ) and the latency distribution, which is the same as for the random reporting case and is given in Lemma 1.The resulting PMF is then the one given in the theorem.
Corollary 2: The CDF of the PAoI Ψ i in a periodic reporting system is given by: (12) Proof: As for the random reporting case, the CDF can be easily derived from the PMF from Theorem 2.
The PAoI distribution for the special case with T = 1 needs to be considered separately, but is trivial to compute, and omitted here due to space limitations.

C. Full Synchronization
We can now consider the fully synchronized system.In this case, the relay is active once every M sensor periods, bounding the PAoI.We first define the value M i , i.e., the number of transmission periods elapsed between the latest relay activation and the generation time of the latest successfully received packet: where mod(m, n) is the integer modulo operation.Lemma 3: The conditional PMF of the PAoI for the fully synchronized system for a given value of M i is given by: otherwise.
(14) Proof: The derivation of the conditional PMF is simple: if we have M − m i consecutive failures, the packet will always be retransmitted by the relay, by the definition of M i , and the PAoI will be (M −m i )T +1.The probability of this happening is f M −mi , as transmission successes are independent.On the other hand, the relay is inactive for earlier packets, so the PAoI distribution is a truncated geometric with success probability 1 − f and a timestep T between attempts.
Lemma 4: The PMF of M i is given by: where N (m) = {1, . . ., m}, m ∈ N + .Proof: The PMF can be derived by applying Bayes' theorem, considering that packets with m i = 0 are always successful, as the relay is active, while other packets are delivered with probability 1 − f .Theorem 3: The PMF of the PAoI in the fully synchronized system is given by: (16) Proof: The correctness of the formula can be easily verified by applying the law of total probability to the conditional distribution from Lemma 3, using the marginal distribution from Lemma 4.

V. EXPECTED AGE OF INFORMATION
We now derive closed-form expressions for the expected AoI in the system we discussed above, in which a sensor sends updates to a gateway through a PEC with erasure probability f .The sensor is aided by a relay node, which is active with probability b and does not actively coordinate with the sensor.The relay retransmits packets that are sent while it is active through an out-of-band channel, enabling the reception of initially lost packets with some additional latency.

A. Random Reporting
We first consider the random reporting system, in which the sensor updates follow a Bernoulli process with probability p.
Lemma 5: The expected AoI for a random reporting system without a relay is Proof: The AoI evolution can be seen as a renewal process and, in this context, the period between successful updates are defined as cycles, since they reset the AoI to 0. The cost of the renewal process is also the AoI.Thus, we can compute the expected AoI by dividing the expected total cost over a cycle by the average duration of a cycle.Following [41], the expected AoI for a sensor whose updates follow a Bernoulli process with probability ρ can be computed as: As failures are independent from the update process, the thinned process considering only successfully received updates is still a Bernoulli process, with probability ρ = p(1 − f ), which we can substitute into (18) and prove the Lemma.
An entirely equivalent formulation can be obtained by looking at the intervals between any two updates, not just successful ones, so that the AoI does not reset to 0, which simplifies inserting the relay in the analysis.
Remark 1: Since the system is kept the same, computing the expected AoI as the average cost over an interval divided by the average length of the interval itself will yield the same result if we consider (a) a renewal cycle between successful updates; or (b) any inter-update interval.
If we take approach (b), for a random reporting system with probability p, the inter-update interval has length p −1 − 1.However, the AoI, which represents the cost, does not always reset to 0 at the end of each interval.Consequently, we say that for those intervals that do not follow a successful transmission, the AoI starts with a bias, i.e., the cost is increased because of the previous failures.The expected cost over a period following a successful update is (p −1 − 1) 2 , and the expected value of the extra cost due to failures is This means that, under this second approach, we can compute ∆ as the sum of p −1 − 1 and a bias β due to previous failures, equal to f times a geometric number of slots until reaching a success, with probability p(1 − f ), averaging over the number of slots, which leads to: The solution of the series in (19) is the result in (17).The same holds for the periodic reporting system.
We can then use this method to account for the relay when computing the expected AoI.
Theorem 4: The expected AoI in a random reporting system with a relay is: Proof: Following the approach from Remark 1, we can compute the expected AoI as the sum of p −1 − 1 and a bias.If we consider the relay, the bias can be computed through three different terms, all of which are only included if the transmission from the sensor fails, so we always have a coefficient f .The terms are then as follows: 1) If the backup is active for the update, the bias is simply equal to β 1 = 1, as the AoI is reset to 1 in the slot following the failed transmission.This happens with probability π 1 = f (1 − q); 2) If the backup is inactive, which happens with probability q, and the last successful update from the sensor was j + 1 slots ago, the bias is the same as in (19).
In this case, the probability of the bias being equal to , as we must consider both the cases with no transmission and with a failed transmission without a backup; 3) If the last successful update j +1 slots ago was from the relay, i.e., the transmission of the sensor failed, but the relay retransmitted it correctly, the bias is β 3 (j) = j +2, with probability π 3 (j) = pf 2 q(1 − q)(1 − pσ) j .
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Combining the three terms, we find that the bias is equal to: This promptly leads to the Theorem statement (20).We note that the effect of the backup at the relay node is localized in the addition of a q term in the denominator to the result for the system without a relay as given in (17).Naturally, the result of (20) implies that when q = 1, i.e., the relay never performs a backup, the expected AoI ∆(p, 0) is the same as the one derived in (17).On the other hand, when q = 0, i.e., the relay is always active, we get ∆(p, 1) = p −1 −1+f , which is consistent with failures causing a fixed increase of 1 in the AoI, since the relay is error-free, but introduces a delay of 1 slot, thus increasing the expected AoI by f .

B. Periodic Reporting
We then consider the periodic reporting system, in which updates are generated deterministically every T slots.We can apply the same method to derive the expected AoI.
Lemma 6: The expected AoI for a periodic reporting system without a relay is given by: Proof: As for the random reporting system, we can compute the expected AoI by dividing the expected total cost by the expected duration of a cycle.Since there is no relay, the number of periods n between two successive packet receptions is geometrically distributed with probability 1 − f , and the expected cycle duration is T 1−f .The expected AoI is then: This result is equivalent to the thesis, completing the proof.
We can then use the same approach of Remark 1 to compute the bias necessary to adjust for the presence of the relay.Theorem 5: The expected AoI in a periodic reporting system with a relay is given by: ) Proof: Following the approach from Remark 1, we can compute the expected AoI as the sum of T −1 2 , i.e., the expected AoI if f = 0, and a bias term.If we consider the relay, there are two bias terms: the first is due to recovered failures, i.e., packet losses during slots in which the relay is active and the gateway receives the packet with an additional delay, while the second is due to unrecovered failures.The two terms are then as follows: 1) If the relay is active and the last successful update from the sensor was nT slots ago, the AoI grows to nT instead of resetting immediately, then resets to 1.The bias term is then β 1 (n) = n, as the additional cost is divided by the renewal period T .This happens with probability π 1 (n) = f (1 − q)σ(1 − σ) n−1 ; 2) If the relay is inactive and the last successful update from the sensor was nT slots ago, the AoI in the next renewal cycle will increase by a factor nT .We then have β 2 = nT , as the additional cost is incurred in all slots of the cycle.This case occurs with probability π 2 (n) = σ(1−σ) n , as the delay of the previous transmission does not matter.Combining the two terms, we find that the bias is equal to: If we set q = 1, i.e., the relay is never active, the expected AoI corresponds to the result in Lemma 6.

C. Full Synchronization
We can now consider the fully synchronized system, exploiting the results on the PAoI.
Lemma 7: If we consider the cycle ending with the reception of packet i, the inter-packet distance X i = G i − max j∈R(Gi−1) G j has the following PMF: The expected duration of a cycle E [X] is given by: Proof: For a given M i , the conditional cycle duration distribution is given by: The proof of this follows the proof of Lemma 3. We can then simply apply the law of total probability using the steady-state distribution of M i from Lemma 4 to get the PMF in (26).
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The expected value can then be easily verified by solving the following sum: Theorem 6: The expected AoI ∆ for the fully synchronized system is given by: where the multiplying term ζ is defined as: (31) Proof: The total AoI over cycle ending with the reception of packet i is defined as follows: This follows the standard method from the discrete-time queue literature [42]: the total AoI over a cycle, over which the PAoI is

2
, and the expected AoI is . We can then use the PMF from Theorem 3 to get the PMF of A i and compute E [A] as follows: The solution of the sum is quite cumbersome, but relatively simple, and leads to the result in (30).

VI. GAME THEORETIC OPTIMIZATION
Similar to [22] and [41], we investigate the optimization of the sensor's duty cycle, represented by its activation probability p in a random reporting system, and the inverse of its period T in a periodic reporting system.In the following, we will often refer to p for both kinds of system, with p = T −1 for the periodic reporting case: this slight abuse of notation allows us to avoid repeating the same definitions twice.
We assume that the sensor incurs a cost c every time it sends an update, implying that the expected cost paid by the sensor is cp.This cost term can model the expenditure of a finite resource by the sensor (e.g., energy in a battery-powered or energy-harvesting sensor) [13], but also the use of the shared wireless medium and of the resources of the relay [25].From a network control perspective, situations when the sensor constantly sends updates represent a waste of bandwidth.Thus, the cost can be seen as a way to regulate the sensor activity.
The cost must then be traded off against a system-level penalty ξ(p, b), representing the performance degradation that results from using stale information.We may express this cost in two ways: 1) We may consider an average-sense optimization, measuring the system-level penalty with the expected AoI, i.e., ξ(p, b) = ∆(p, b); 2) We may consider a risk minimization scenario, in which the system-level penalty is represented by the probability of violating a threshold PAoI, i.e., ξ(p, b) = 1 − P Ψi (ψ max ).Since the system-level penalty and the energy cost are both objectives to minimize, the utility function of the source can be described as following the standard game theoretical convention that models the players as utility maximizers instead.We remark that u S (p, b), coherently with the usual requirements of utility theory, is written as a function of both p and b: the dependence on b is caused by the fact that both definitions of ξ(p, b) depend on q, which is defined as 1−b.This means that we can account for the beneficial impact that the relay node and its backups have on the AoI, which in turn allows to transmit more often, in spite of this causing an increased cost cp.
Following the same logic, the relay node also incurs a cost for every time slot in which it is active, and we denote it as a coefficient k.The expected energy cost for the relay node is then kb, and the utility of the relay node is then defined as since the relay is interested in helping the sensor in its task.We model the sensor and the relay in the random reporting scenario as two rational agents S and R playing a static game of complete information with continuously valued actions p and b, both of which fall in [0, 1].These agents follow their respective utilities u S (p, b) and u R (p, b).This game structure implies that values p and b are determined by each agent independently and unbeknownst of each other, which would fulfill the typical requirements of IoT systems to minimize the signaling between nodes, as well as offering improved robustness against wrong or missing exchanges [32].As remarked above, the sensor does not require reception capabilities to determine the strategy, but only the knowledge of the erasure probability f and of the cost parameters c and k.
Theorem 7: The game between the sensor and relay is an Exact Potential Game (EPG), and its potential function ϕ(p, b) is given by: We remark that the potential function ϕ(p, b) is slightly different from the total utility u S (p, b)+u R (p, b) = −2ξ−cp− kb, since a factor 2 is missing.Indeed, the total utility is not a potential function, as it does not meet the conditions for an exact potential.As stated in [43,Lemma 2.7], the potential function we identified is unique, aside from an additive constant term.
Corollary 3: The game has at least one pure-strategy NE, and it is a maximum of the potential function. Proof: The corollary follows from applying [43, Lemma 2.1] and [43,Lemma 4.3]; we refer to that work for a more complete analysis of the properties of potential games.
The pure-strategy NE can be found in a computationally efficient way that also translates into a distributed system management, through the procedure known as fictitious play [27], which in essence corresponds to each node working independently to locally maximize its own utility function without the need of coordinating with the other node.The same procedure can be followed to prove that ϕ(T, b) = ξ − c T − kb is a potential function for the periodic reporting scenario, with the same corollary.

A. Random Reporting
If we consider the random reporting system, and use expected age as a target, i.e, ξ(p, b) = ∆(p, b), the game is an EPG over a compact, continuous move space for both players, as we have p, b ∈ [0, 1].In this case, a pure NE can be found by finding the local maxima of the potential function ϕ(p, b).The NE condition can then be stated as Theorem 8: The scenario with random reporting and an expected age objective has a single pure NE, which is either at the boundaries (i.e., one of the two players is either always or never active) or given by: Proof: Using q instead of b, we can rewrite the second condition of (38) as ∂ξ(p, b)/∂q = k.After computing the partial derivatives, we then have the following conditions: It is immediate to see that the first equation has a single positive solution.A simple way to guarantee that p ∈ (0, 1) is to impose c ≥ 1, which also makes sense if compared with a strategic case without relays [41].On the other hand, the second equation has two solutions, one of which is always greater than 1.We then have three cases: 1) If the second solution is negative, we need to compare the potential of the two cases with q = 0 and q = 1, computing p as the sensor's best response in each case.
The case with the highest potential is the NE; 2) If the second solution is greater than 1, the NE is given by q = 1, while p is the best response as given by the first condition; 3) If the second solution is in (0, 1), we can get the solution from the Theorem statement with some algebraic manipulation.In all three cases, there is a single maximum of the potential function in the pure strategy space, so there is a single pure NE.
Corollary 4: The relay node in the random reporting scenario, considering the expected age as a system penalty function, is always active (i.e., q = 0) if: while it is always inactive if: Proof: The behavior of the relay node ultimately depends on the numerical value of k (as is also intuitive).The higher the value of k, the lower the probability of an active relay becomes.The first condition can be easily derived by imposing a negative solution to (40); the NE is on the left boundary (i.e., q = 0) by a continuity argument: as k decreases, the value of q should also decrease, as the energy cost for relay is smaller.In the same way, we impose a solution larger than 1 to derive the second condition.
The strategic choice for q = 0 is p = (c) −0.5 , while it is p = (c(1 − f )) −0.5 if the strategic relay is never active, i.e., q = 1.In the case p and b both fall in inner points of [0, 1], their numerical values can be immediately found by a recursive approach, where an initial value p = p (0) can be set and then used to solve (39) to derive any q (i) from p (i) and then p (i+1) from q (i) .This procedure would correspond to the technique known as fictitious play [27].As the game is an EPG, it possesses the Approximite Finite Improvement Property (AFIP) [43], which states that every improvement path that reaches a regret smaller than ε, with ε > 0, is finite.Convergence is then guaranteed, as every iteration of fictitious play increases the potential, and we reach an ε-equilibrium in a finite number of steps thanks to the AFIP.

B. Periodic Reporting
If we consider periodic reporting, the sensor's action is the period T ∈ N + .The potential is then a mixed function ϕ : N + × [0, 1] → R, and we can compute the best responses individually.
Theorem 9: The best response of the relay q * (T ) is Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
Proof: The partial derivative ∂ϕ(p, q)/∂q is given by: As for the random reporting case, one of the solutions of the quadratic equation identifying the local maxima is always greater than 1.The pure NE then corresponds to the other solution, which is the one given in the Lemma, if it is in (0, 1).Following the same reasoning as for the random reporting case, the best response is 0 if both solutions are negative and 1 if the positive solution is greater than 1.Lemma 8: The potential function ϕ(T, q) has either one or two maxima in N + for a given value of q, which are determined by the value T d (q): If T d (q) ∈ N + , the potential has two maxima, which are T d (q) and T d (q) + 1.Otherwise, it has a single maximum, which is given by ⌈T d (q)⌉.
Proof: We consider the conditions under which ϕ(T, q) = ϕ(T + 1, q): The only positive solution to this quadratic equation is given by T d (q), as defined in the Lemma.We can also consider that, since the quadratic equation is convex, ϕ(T, q) is strictly decreasing, i.e., ϕ(T + 1, q) < ϕ(T, q), for any T > T d (q), and strictly increasing for any T < T d (q).We then consider the case in which T d (q) / ∈ N + : in this case, ⌊T d (q)⌋ < T d (q), and so ϕ (⌈T d (q)⌉) > ϕ (⌊T d (q)⌋).However, since ⌈T d (q)⌉ > T d (q), we know that the maximum must be ⌈T d (q)⌉, as the function becomes strictly decreasing.On the other hand, if T d (q) ∈ N + , ϕ(T d (q), q) = ϕ(T d (q) + 1, q), and the potential function has two maxima.
There is a boundary condition to keep into account, as the potential is always monotonically decreasing if T max < 1, which happens if c < 1+f q 4(1−f q) .In that case, the optimal choice is always T = 1, i.e., the sensor transmits regardless of the relay's activity or failure probability.We can then simply impose c ≥ 1+f 4(1−f ) to avoid this edge case.Theorem 10: If the best response of the sensor T * (q) is then fictitious play approximately converges to a pure NE in a finite number of steps.Proof: We can use Lemma 8 to easily verify that the best response is unique if T d (q) / ∈ N + , and approximate convergence in a finite number of steps is guaranteed by [44,Theorem 7].On the other hand, if T d (q) ∈ N + , one of the two best responses is on an improvement path, as it increases the potential.By verifying the next best response of the relay, we ensure that there are no loops and that each best response improves the potential.
Corollary 5: The relay node in the random reporting scenario, considering the expected age as a system penalty function, is always active (i.e., q = 0) if: while it is always inactive if: (49) Proof: We first consider the case in which q = 0.In order for q to be 0, the condition is the following: We can substitute the optimal value of T for q = 0, i.e., T * 0 = After some algebraic steps, the condition is proven.We can follow the same logic to prove the second condition, which is verified if: We then substitute the optimal value of T for q = 1, i.e., The resulting condition then corresponds to the one stated in the corollary.If either of these conditions is verified, the value of q is either 0 or 1, and the value of T is T * 0 or T * 1 , respectively.In this case, we may have two pure NEs, due to the discretization of the sensor's action space; in this case, the selection of the NE depends on the initial strategy in the iterated best response.One NE will be more advantageous for the sensor, while the other will be better for the relay.Since we assume that both nodes are cooperating, and just need to decide on a common strategy, we will pick the NE with the highest potential in each scenario, which may favor either node.

C. Full Synchronization
Under full synchronization, both nodes take discrete actions.We then follow the same strategy as the periodic case, showing that the best response strategies are on an improvement path, and thus fictitious play converges to a pure NE.
Theorem 11: The best response T * (M ) of the sensor for a given relay period M is Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.where M * (T ) is the best response strategy for the relay, and value T d (M ) is defined as: If the sensor follows this strategy, the best response is on an improvement path.Proof: As in the proof of Lemma 8, we consider the conditions under which ϕ(T, M ) = ϕ(T + 1, M ), obtaining the following quadratic equation: This equation has a negative solution, while the other solution is positive if ≤ c.We can then follow the same steps as for Theorem 10 to show that T * (M ) is a best response and is on an improvement path.
Theorem 12: The best response M * (T ) of the relay for a given sensor period T is The value of M d (T ) is given by: where W (•) is the Lambert W function [45].The constant factors are given by α = 1 − f and β = 1 . If the function has no positive solution, the best response is always M = +∞, i.e., the relay never activates.
Proof: Computing the conditions under which ϕ(T, M ) = ϕ(T + 1, M ) leads to the following equation: The solution to the equation is then M d (T ) as given by ( 56), and we can follow the proof of Theorem 10.The discretization of choices also means that there might be two NEs.It is sufficient to start fictitious play from M = 1 and from M = +∞ to find both, and as the nodes are cooperative, they can select the NE with the highest potential and achieve the global optimum.

D. Risk Optimization
If we consider the risk optimization scenario, i.e., set ξ(p, b) = 1 − P Ψ (ψ thr ), there is no analytical solution, as the CDF of the PAoI is exponential, and finding the maxima requires solving high-degree polynomial equations.In this case, we can find an approximate solution by discretizing the action spaces for both nodes and finding the pure NEs with the Lemke-Howson algorithm [46], which takes the bimatrix form of the game as input.This might be computationally rather complex, as the Lemke-Howson's worst-case complexity is exponential in the size of the action space, but we found that a rough action space quantization is often enough to reach a good solution, resulting in quick convergence in practice.
In the random reporting case, the quantization interval λ determines the approximation level of the NE.In the periodic reporting case, the relay's action space still needs to be quantized, while the sensor's action space is already discrete, if unbounded.We then only need to set a maximum transmission period in order to obtain a finite discrete action space.

VII. NUMERICAL RESULTS
We now consider the results in some possible scenarios, both considering the AoI and the game theoretic optimization.We verified the analytical results by running a Monte Carlo simulation for 10 7 slots.The parameters for all simulations are reported in the figure captions.The Monte Carlo results confirmed that the derived PAoI distributions were correct, as Fig. 2 clearly shows.The expected AoI results are shown in Fig. 3, confirming the validity of the analysis.We selected the same average activation rates for the three systems in both figures, and performance for schemes with the same average traffic is shown using the same color in Fig. 3: as expected, higher traffic results in a lower average AoI.We can also note that the periodic reporting system is second-order stochastically dominant [47] over random reporting: the average AoI for the former is smaller than for the latter reporting, and the tail is more contained as well.The same is true for the fully synchronized system over the periodic reporting one, at the cost of some signaling to maintain synchronization between the sensor and relay.
After verifying the correctness of the PAoI analysis, we consider the game theoretic optimization, using expected AoI as the system penalty: Fig. 4 shows the resulting NE for the three systems, as well as the potential, comparing strategic choices to relays with a fixed activation probability.In the random reporting system, the activation probability p for the sensor increases gradually with f in the case of a strategic relay, while the relay is always inactive until the erasure probability is relatively high, as Fig. 4a shows: indeed, there is no need for a relay to assist an error-free communication.Naturally, a lower value of the relay activity cost k would lead it to be active even for lower values of f , and vice versa.If f increases further, the relay's activity rate increases, while the sensor's decreases: as the channel between the sensor and the receiver becomes worse, it becomes more convenient for the relay to shoulder the cost of the transmission, as it can guarantee a reliable delivery, while a higher activation rate from the sensor would provide diminishing returns.Fig. 4b also shows that the game theoretic optimization allows us to maximize the potential with respect to fixed strategies, improving the overall system utility.
The equivalent results for the periodic reporting and fully synchronized systems are shown in Fig. 4c-e and Fig. 4f-h, respectively.In these cases, we have two NEs, with slightly different values of the potential: the first NE tends to shift the burden of the communication towards the relay, with a higher value of b and a longer transmission period T for the sensor, while the other NE does the opposite.As we remarked in the previous section, this is due to the discretization of the sensor's action space.The value of the potential in these scenarios is higher, as the expected AoI is generally lower for the same settings.In a cooperative system, the two nodes can then compute both NEs, selecting the one with the highest system-level utility, which we will use in all the following analyses.
We can then consider the effect of the relay cost parameter k, shown in Fig. 5.As expected, the expected AoI in the random reporting system is generally lower for lower values of k, as shown in Fig. 5a: the AoI is only the same for very low values of f , when the relay is always inactive for any value of k.If we look at the higher values of f , the AoI for k = 5 and k = 10 becomes the same as for k = 1 over a certain k, as the relay activity rate becomes 1.If we consider higher values of k, the sensor must increase its activity to balance higher values of f , as it is not able to rely on the backup provided by the relay.We can then define three activity regimes by looking at the relay activation probability in Fig. 5c: • A low-error regime, in which the sensor activity rate increases linearly, while the relay is inactive; • A high-error regime (only reached if the sensor activity cost is high), in which the relay is always active, and the sensor activity rate is fixed, as the benefits in terms of expected AoI are not enough to offset the cost; • An intermediate regime, in which p decreases with f , while b increases.A similar pattern appears in Fig. 5d-f for periodic reporting.Interestingly, the discrete nature of the reporting period leads to a staircase-shaped AoI pattern: the expected age gradually grows as f increases, then sharply drops when the cost becomes high enough that the sensor reduces its period.When the relay is also active, the age trend becomes smooth again, as b can take continuous values.In the fully synchronized system, shown in Fig. 5g-i, both the low-error and intermediate-error regimes present the sawtooth pattern, as the relay's actions are also discrete.We note that, in the lowand high-error regimes, the periodic and fully synchronized system behave identically, as the relay is either always or never active, and the difference between the two manifests only in the intermediate regime.
We can also perform the same evaluation as a function of the sensor activity cost c, as shown in Fig. 6.The three activity regimes are still present in the random reporting scheme, as shown in Fig. 6a, and lower values of c lead to a higher activity from the sensor (and a lower activity from the relay), as well as a generally lower AoI.As Fig. 6d-f show, the periodic reporting system shows a similar pattern, although the high-error regime is reached earlier, and the expected AoI is generally lower.We can note the same sawtooth pattern in the low-error regime that we remarked on above; the same considerations hold for the fully synchronized system, shown in Fig. 6g-i.
We can then consider the threshold violation probability optimization, in which the system penalty is ξ = 1−P Ψ (ψ thr ): Authorized licensed use limited to the terms of the applicable license with IEEE.Restrictions apply.
in this case, we considered a quantization interval λ = 0.01, which led to a quick convergence of the Lemke-Howson algorithm in all cases.Naturally, the period T in the periodic reporting system is already discrete, but we capped its value to 100.The value was reduced to 30 for both M and T in the fully synchronized system.Fig. 7 shows the NE and threshold violation probability for the threshold violation optimization.The first thing we can note, which is evident from Fig. 7a, is that the sawtooth pattern appears in the random reporting system as well.This is due to the quantization of the action space, visible in Fig. 7b-c, and a finer quantization step would have led to a smoother curve.We also note that the sensor and relay costs need to be reduced, as the risk is naturally bounded in [0, 1] since it represents a probability, with a much lower dynamic range with respect to the expected age.In this case, the relay never reaches an activity rate of 1, while it does in the periodic reporting scenario, as shown in Fig. 7d-f.We can see that, for higher values of the sensor cost c, the relay activity rate grows very rapidly after a certain threshold.If the relay activity is 1, the optimal strategy for the sensor is to transmit every ψ thr − 1 steps, as this guarantees that the PAoI will never cross the threshold (any failed transmission will be recovered through the backup transmitted by the relay, with an additional delay of 1 slot).In this case, the risk drops to 0 and the energetic burden of the system is borne almost entirely by the relay.If we consider lower values of c, this regime is never reached, and the relay activity rate b gradually increases with f .The fully synchronized system, shown in Fig. 7g-i, is even better at maintaining a low violation risk, as its deterministic activation pattern results in zero risk for Ψ thr ≥ M T + 1.

VIII. CONCLUSION AND FUTURE WORK
In this work, we studied a system in which status updates from a sensor must be delivered through receiver over a slotted-time, error-prone slotted channel, considering the presence of a relay node that can recover failures in the subsequent slot.The relay and sensor are energy-limited and follow a duty cycle, with either memoryless or periodic activations and no coordination between the two.
We leveraged a closed-form analytical computation of the expected AoI and PAoI threshold violation probability as functions of the involved parameters, to derive a game theoretic representation of the interaction between the sensor and the relay as strategic agents driven by a common potential.We showed that such an approach can be used to derive an efficient system working point without any signaling between the sensor and relay, but just through local computation at each node.Thus, our proposed approach can be generalized to a framework for practical implementations with backup outof-band relays in IoT scenarios [40].
Future extensions may consider different models for the activity of the nodes other than i.i.d.activation probabilities, e.g., stateful optimizations can be performed [38], and the same for the failure rate of the channel [48], as well as the data generation process [49], potentially including correlation between different sources [50], [51].

Fig. 1 .
Fig. 1.Illustration of the considered monitoring scenario, in which the relay intercepts information from a sensor and repeats it to ensure its correct transmission.
) Proof: To prove that ϕ(p, b) is a potential function [43], we first need to show that it is a potential function for u S (p, b), i.e., that u S (p, b) − u S (p ′ , b) = ϕ(p, b) − ϕ(p ′ , b).As b is fixed, this is trivially true, since the potential function ϕ(p, b) = u S (p, b)−kb.Secondly, we need to show that ϕ(p, b) is also a potential function for u R (p, b).This is also true, as the potential function ϕ(p, b) = u R (p, b) − cp.In both cases, the equality is valid under any definition of ξ(p, b).

Fig. 3 .Fig. 4 .
Fig. 3. Expected AoI as a function of f for different values of p and T .

Fig. 5 .
Fig. 5. Effect of the relay cost parameter k on the NE for the three schemes with c = 100.

Fig. 6 .Fig. 7 .
Fig. 6.Effect of the sensor activity cost c on the NE for the three schemes with k = 10.