Counter Waves Link Activation Policy for Latency Control in In-Band IAB Systems

3GPP’s Integrated Access and Backhaul (IAB) architecture is expected to deliver a cost-efficient option for deploying 5G New Radio (NR) systems. However, IAB relies on multi-hop wireless communications, and packet latency therefore becomes a critical metric in such systems. Latency minimization in the in-band backhauling regime involves dynamical scheduling of active transmission links so as to avoid half-duplex conflicts, which brings significant control overheads. In this letter, by using the formalism of Markov decision processes (MDP), we identify a general fixed link activation policy and the associated policy design algorithm for tree-shaped in-band IAB systems with half-duplex constraints. The proposed policy, named “counter waves”, does not require signaling between the IAB donor and nodes and provides stable low latency for low-to-medium traffic conditions spanning up to 60% of the capacity region of the system.


I. INTRODUCTION
I NTEGRATED Access and Backhaul (IAB) has been stan- dardized by 3GPP [1] to remedy the small coverage area of millimeter wave (mmWave) New Radio (NR) base stations, thus eventually enabling commercial 5G mmWave NR deployments [2], [3].IAB relies upon low-cost wireless relay nodes, IAB-nodes, to extend the coverage of the base station, called the IAB-donor and having a wired link to the core network.
Although IAB allows for several implementation options including in-band and out-of-band backhauling and half-/fullduplex operation [1], the in-band is considered the most efficient as it permits to fully utilize the available spectrum [4].However, an in-band IAB implementation implies half-duplex operation forbidding IAB-nodes/donor from simultaneously receiving and transmitting over their radio interfaces.The fullduplex radio technology is expected to solve this problem in the future, but it is not yet mature for commercial market [5].
Multi-hop radio backhauling in IAB brings into focus the end-to-end latency as a performance measure on a par with throughput and coverage widely considered previously [6], [7].In the case of in-band half-duplex implementation, latency Manuscript received 29 August 2023; accepted 4 September 2023.Date of publication 8 September 2023; date of current version 9 November 2023.This work is supported by the Academy of Finland under the projects "Machine Learning Methods and Algorithms for 6G Terahertz Cellular Access" (HAR-MONIOUS) and "Enabling Mobile Terahertz Communication for 6G Cellular Networks" (EMERGENT).The associate editor coordinating the review of this letter and approving it for publication was B. Makki.(Corresponding author: Natalia Yarkina.) The authors are with the Faculty of Information Technology and Communication Sciences, Tampere University, 33720 Tampere, Finland (e-mail: natalia.yarkina@tuni.fi).
Digital Object Identifier 10.1109/LCOMM.2023.3313233performance of an IAB network greatly depends on an efficient time division multiplexing (TDM) control policy setting transmit/receive modes of IAB-nodes and activating corresponding links.Optimally, the IAB-donor takes such control decisions based on the full knowledge of the network's state including the instantaneous UE traffic requirements, which may induce significant control latency and signaling overheads [8].The problem of latency optimization in IAB systems has been addressed in just a few studies [6], [9], [10], none of which considered the half-duplex decentralized link scheduling that does not require timely and efficient signaling.The goal of this letter is to devise, for half-duplex IAB networks with multi-sector nodes, a general fixed TDM control policy that does not require signaling and the full knowledge of network state.A Markov decision process (MDP, [11]) based framework is employed to develop the sought policy, with the end-to-end latency as the main control optimization criterion.The main contributions of our work are: • a mathematical framework for latency-oriented link activation in IAB systems in terms of MDP, which can be solved directly or by using reinforcement learning; • a fixed control policy and the policy generation algorithm for tree-shaped IAB systems with multi-sector nodes; • results showing that the proposed policy provides stable low latency in low-to-medium traffic conditions.The rest of the letter is organized as follows.Section II introduces our system model and its formalization.In Section III we obtain the system dynamics equations and formulate the MDP.In Section IV we devise our main result, the Counter Waves (CW) link activation policy.Section V provides numerical results.Finally, conclusions are drawn.

II. SYSTEM MODEL
We consider an IAB network with one IAB-donor and N − 1 > 0 IAB-nodes.Let N denote the set of all network nodes enumerated from 1 to N so that (i) node 1 represents the IAB-donor, and (ii) if i < j ≤ N then the number of hops from node i to node 1 is not greater than from j to 1.Each node in N features multi-sectoral design with each sector having 120 degree coverage and being equipped with an antenna array operating in beamforming mode.Three-sector nodes were chosen as a typical scenario, and the model can be generalized to nodes with various numbers of sectors.Each sector provides access to associated UEs and some sectors also provide backhaul connectivity with another node.We enumerate sectors from 1 to 3 and assume that the connectivity towards the donor is always provided by sector 1.It is assumed that the backhaul links between the nodes yield a spanning tree topology with the IAB-donor at its root.Such an IAB topology is suggested in [1].This topology can be specified by an adjacency matrix T = (T i,j ) i,j∈N , where T i,j ∈ {1, 2, 3} indicates the sector through which node i communicates with node j, and T i,j = 0 if there is no backhaul connectivity between nodes i and j.The backhaul topology is assumed fixed at the network planning stage.
Due to half-duplex operation all sectors of a node at any given time can either transmit or receive data.The sectors providing backhaul connectivity can transmit/receive either backhaul or access.We consider the system in discrete time indexed by n = 0, 1, 2, . . ., where the time unit (called "time step") corresponds to a single transmission time interval.The controller can switch the mode of each sector once in each time step.Let a i (n) = 0 if, at time n, node i ∈ N is receiving, and To specify access/backhaul regime, let B i,m (n) = 0 if sector m of node i is in access mode and B i,m (n) = 1 if sector m of node i is in backhaul mode, i.e., transmits to or receives from the associated IAB network node.If there exists j ∈ N such that Since only some sectors can switch between access and backhaul, and backhaul is enabled only if both its incident sectors are in backhaul mode, the number of valid controls B = (B i,m ) i∈N ,m=1,2,3 among all binary N × 3 matrices is small.It is thus convenient to represent control B as a vector b ∈ {0, 1} N −1 whose entry b l indicates whether the backhaul link corresponding to edge (i, j) indexed l = max{i, j} − 1 is enabled.Conversely, control B = B(b) can be obtained from the corresponding b as follows.Let b = (0, . . ., 0) correspond to B = 0, otherwise the entries of B are zero unless b l = 1 for some l, in which case B l+1,1 = 1 and B j,T j,l+1 = 1 for such j < l + 1 that T l+1,j = 1.Fig. 1 provides an example illustrating the notation for the topology and controls.
We assume that data packets flow in the uplink and downlink directions.The IAB-donor has a wired connection to the core network, through which the uplink packets leave the system and the downlink packets arrive.The packets waiting for transmission are held in buffers sized so that, once a link is enabled, all the packets from the corresponding buffer can be successfully transmitted within the same time step.No packet can travel over more than one link in one time step.
The IAB network operating in the in-band half-duplex regime requires an efficient control policy.Such a policy permits to switch the sectors' modes between transmit/receive backhaul/access, i.e., to choose controls (a(n), B(n)), n = 0, 1, 2, . . ., in function of the system's state so as to optimize the metric of interest, namely the end-to-end packet latency, by which we understand the total number of time steps a packet has spent in the system.An efficient fixed control pattern running on a loop and not requiring exchange of state information is of particular interest as it reduces control latency and signaling overhead.

III. SYSTEM DYNAMICS AND MDP FORMALIZATION
The end-to-end latency of any packet leaving the network at time n cannot exceed the age (in time steps), increased by one, of the "oldest" packet in the network at time n − 1.Let r(s, a, b) denote a reward that the controller gains for applying control (a, b) in state s, and assume it bounded and inversely dependent of the maximum packet age in the network after enacting the control and packet transmission.In this section we construct an MDP to obtain a control policy π * maximizing the expected total discounted reward where α ∈ [0, 1) is a discount factor and E π s(0) denotes the conditional expectation given policy π and initial state s(0).For this, we define the system's states and derive its dynamics equations relating the states at times n and n + 1.

A. System States
Let the nodes 2, . . ., N each have one uplink backhaul buffer.To describe the access state, we consider 3N uplink buffers, each of which contains packets waiting for uplink transmission in the group of UEs associated with one sector.The uplink access buffers receive uplink exogenous arrivals.
In the downlink, we account for the routes and consider N classes of packets, where class k packets are destined to UEs associated with any sector of node k ∈ N .Let N k ⊂ N , k ∈ N , denote the set of all nodes through which class k packets travel.Denote N k = |N k | and notice that N k −1 is the distance from node k to node 1.Let each IAB network node have one backhaul downlink buffer and three access downlink buffers, one for each sector.The backhaul downlink buffer in node 1 receives all downlink exogenous arrivals.The packets leaving downlink access buffers leave the system.
We describe the buffers' states by the age of the "oldest" packet therein, defined as the number of time steps the packet has spent in the system.Denote by v i (n) the age of the oldest packet in the uplink backhaul buffer in node i ∈ N \ {1} at time n and let v i (n) = 0 if the buffer is empty.Denote by w k,i (n) the age of the oldest class k packet in the downlink backhaul buffer of node i ∈ N at time n (w k,i (n) = 0 if the buffer is empty).Let V i,m (n) and W i,m (n) represent the maximum packet ages in, respectively, the uplink and downlink access buffers corresponding to sector m ∈ {1, 2, 3} of node i ∈ N .Now, the system state at time n is specified by s

B. Dynamics Equations
Controls enable different subsets of links so that packets can travel through the network.Given the controls a(n) = a and B(n) = B, the backhaul connectivity at time n is specified by an adjacency matrix R where 1 x = 1 if x is true, and ) is enabled by the chosen control, and R i,j (n) = 0 otherwise.To describe the uplink access connectivity at time n, let r UL (n) = (r UL 1 , . . ., r UL N ), where ) Similarly, for the downlink access we have r DL (n) with Thus, r {UL,DL} i,m (n) = 1 or 0 according as the corresponding access link is active or not.Now we can formalize the state change from time step n to n + 1.Let θ UL i,m (n) represent the indicator of whether the uplink access buffer in sector m of node i received exogenous arrivals at time n.Then, for i ∈ N , m = 1, 2, 3, we have Here, the first equation zeroes the entries corresponding to active links, since the packets leave these buffers to move forward.The second equation finalizes the new state by taking account of new arrivals and by increasing the age of the remaining packets by 1 via the function ϕ(x), which adds 1 to its argument if it is non-zero.
The packets leaving access buffers in nodes i ∈ N \ {1} move to the backhaul buffers in these nodes.We denote their maximum ages by Let R UL (n) be a lower triangular matrix of order N whose elements are obtained from R(n) as Now, the states of the backhaul buffers, i ∈ N \ {1}, are In the downlink, similarly, let θ DL k (n) = 1 if class k ∈ N exogenous arrivals occurred at time step n, and θ DL k (n) = 0 otherwise.For any class k ∈ N , the downlink backhaul routing matrix R DL k (n) at time n is upper triangular of size N with its entries obtained from R(n) as Now, the transitory state of the backhaul buffers after backhaul packet transmission and packet "aging" is given by Then, for any i, k ∈ N , the state of a backhaul buffer is otherwise.
(11) When distributing a batch of downlink packets arriving from backhaul among the access buffers of the destination node, we assume the worst case when there is a packet with the maximum age going to each sector.The maximum packet age in such a batch is given by w k,k (n+0).Then, the quantity represents the increased by 1 maximum age of the downlink packets arrived in time step n from backhaul to access.Now, the state of the access buffers at time n + 1 is

C. Markov Decision Process
Let us now formalize an MDP for optimal control under the worst-case full-buffer assumption, i.e., θ UL i,m (n) = θ DL i,m (n) = 1 for all i, m and n.

2) States (finite):
We demand that no packets experience a latency larger than some τ > N N and let the set of model states, S, consist of all such s = (v, V, w, W) that have their entries in {0, 1, . . ., τ } and can be obtained from state 0 by ( 5), ( 6), ( 8) and ( 10)-( 13) for a finite sequence of controls and 3) Actions (finite): Let S τ ⊂ S denote the set of terminal states, i.e., the states in which at least one entry is τ .Now, the set of feasible actions in non-terminal states s ∈ S \ S τ is ∀i < N and j < i + 1 such that T i+1,j = 1}.(14) 4) Rewards: We define the reward that the controller gains for applying control (a, b) in state s via the maximum packet age in the network after enacting the control, packet transmission and aging but before new exogenous arrivals: (15) 5) Transition Probabilities: Transitions of the process under the full-buffer assumption are deterministic and follow the system equations derived previously.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Assuming the expected total discounted reward criterion (1), by adopting the dynamic programming method [11], we can now obtain an optimal IAB link activation policy.Furthermore, since the process' transitions are deterministic and the state space is finite, the resulting policy is fixed.

IV. "COUNTER WAVES" FIXED CONTROL POLICY
The MDP model has permitted to identify the following CW fixed control policy, which minimizes the maximum latency in the network.The CW pattern design is based on the optimization of the network's longest route, the remaining nodes being then synchronized with the nodes of the longest route according to their distance from the root.To be applied on a loop, a fixed policy of length L must make each backhaul-providing sector go through four modes -send/receive access/backhaulat least once, therefore, L ≥ 4. We set L = 4 and find the controls for four consecutive decision epochs, time steps 4n, . . ., 4n + 3, n = 0, 1, . . ., as follows.First, we create a continuous uplink "wave" on the class N (longest) route, one hop at a time step.Second, we fit in a continuous downlink "wave" on the class N route.Third, we fill the gaps on the class N route with access modes.And fourth, we synchronize all the other nodes with the class N route by their tree levels, both send/receive and access/backhaul.
To show that the CW pattern can be obtained for any IAB network, in Fig. 2 we consider a "chain" of N nodes.First, we let N = 2 and build the CW pattern for such a network starting the uplink wave (in blue) at times 4n.This is represented by the last two nodes on the right in Fig. 2. The corresponding controls are given in the figure by the last two columns for control a and the last column of control b.
Now, we add one node to the network, make this node the donor and increase the indices of the previously considered two nodes by one.The resulting network is represented by the last three nodes on the right in Fig. 2. We see that the CW pattern can be extended to the network of three nodes without changing the controls for the two nodes considered previously: we need to set the controls only for the newly added node so that they fit the established waves.For N = 3, the controls are given by the last tree columns in a and the last two columns in b.By repeating the procedure further, we notice that the pattern and controls repeat themselves every four nodes.We denote the repeating segments of the controls by The matrices (16) suffice to design a CW pattern for any tree-shaped IAB network with multi-sector nodes, regardless the number of sectors.Indeed, consider an IAB network with multi-sector nodes and let its topology be given by T. Use T to obtain N k , k ∈ N .Now, Algorithm 1 yields the CW control sequence (a (l) , b (l) ) l=1,...,4 for this network.The algorithm does not require signaling.It accepts the topology as an input and computes the fixed CW link control policy, which can be utilized as long as the topology remains intact.
The maximum packet latency under the CW policy is τ = N N + 4.This cannot be improved any further under the full-buffer assumption.Indeed, τ consists of the traveling time, N N , that cannot be shorten under our assumptions regarding network operation, and the worst-case waiting time L = 4.To reduce the waiting time, e.g., in the downlink, one could launch the downlink wave more often.However, the downlink wave must go all the way to sector 1 of node N N and employ two TDM phases of this sector.This necessarily results in additional waiting in the uplink for the sector's UEs, so the maximum latency in the network would increase.DEFAULT SYSTEM PARAMETERS Fig. 3.The end-to-end latency for the network of Fig. 1 under the CW, BP and backpressure with priorities to overloaded links (BP-OP, [12]) control policies vs. λ.The BP policy is known to be optimal in throughput, therefore its stability region indicates the capacity region of the system [13].

V. NUMERICAL RESULTS
To evaluate the CW fixed policy numerically, we set the access links' capacities according to TS 38.306 [14] using with the parameter values given in Table I.The backhaul links' capacities are then assumed 1.8 C AC DL .This yields the rates of approximately 100, 540 and 970 packets per time step in access uplink, access downlink, and backhaul, respectively.We consider a typical case of a downlink-dominated system.Specifically, the numbers of arriving packets per sector are independent and follow the Poisson law with rates λ in the uplink and 4λ in the downlink.All buffers are unlimited, the packet size is 1500 bytes, and the time step is 1 ms.Fig. 3 shows the mean and the 95th percentile of the endto-end packet latency as a function of λ for the network in Fig. 1 under CW, the well-known backpressure (BP) control policy [13] and its latency-focused modification [12].The BP algorithm here illustrates the capacity region of the system [13] and is provided to understand the applicability region of the proposed policy.Observe that CW provides predictable and low latency over approximately 60% of the capacity region outperforming both considered dynamic algorithms.The latency remains flat as long as the queues are stable.
To understand the performance of the proposed policy for different topologies, Fig. 4 depicts the mean latency as a function of λ for a "chain" IAB network for N = 2, 3, 4, 5.It can be seen that under the adopted assumptions CW provides consistently low latency up to the boundary of its stability region, which constitutes about a half of the capacity region of Mean end-to-end packet latency vs. λ in chain networks with N = 2, 3, 4, 5 under the CW (solid lines) and backpressure (dashed) control policies.Here, each new node adds 15λ to the total rate of exogenous arrivals in the network, which explains the rapid decrease of the capacity region.
the system.We note that in the identified sub-regions, the IAB system does not need signaling to provide such a performance.

VI. CONCLUSION
The 3GPP IAB technology requires network control policies to optimize end-to-end latency performance.To minimize latency in the in-band backhauling regime with half-duplex constraints, in this letter, we utilized the MDP formalism to identify a general fixed control policy that does not require signaling and provides predictable and low latency.The proposed policy can be applied to low-to-medium traffic conditions covering up to 60% of the capacity region of the system.

Fig. 1 .
Fig.1.An example IAB network with 3-sector nodes with an applied control.

Fig. 2 .
Fig. 2. The Counter Waves (CW) pattern design for a chain network of any length and derivation of matrices A and D (16).The uplink "wave" is formed by blue arrows, the downlink by red.Horizontal arrows indicate backhaul transmission, oblique indicate access.

Fig. 4 .
Fig. 4.Mean end-to-end packet latency vs. λ in chain networks with N = 2, 3, 4, 5 under the CW (solid lines) and backpressure (dashed) control policies.Here, each new node adds 15λ to the total rate of exogenous arrivals in the network, which explains the rapid decrease of the capacity region.