Power-Efficient Beam Tracking During Connected Mode DRX in mmWave and Sub-THz Systems

Discontinuous reception (DRX), wherein a user equipment (UE) temporarily disables its receiver, is a critical power saving feature in modern cellular systems. DRX is likely to be aggressively used at mmWave and sub-THz frequencies due to the high front-end power consumption. A key challenge for DRX at these frequencies is blockage-induced link outages: a UE will likely need to track many directional links to ensure reliable multi-connectivity, thereby increasing the power consumption. In this paper, we explore bandit algorithms for link tracking in connected mode DRX that reduce power consumption by tracking only a fraction of the available links, but without adversely affecting the outage and throughput performance. Through detailed, system level simulations at 28 GHz (5G) and 140 GHz (6G), we observe that even sub-optimal link tracking policies can achieve considerable power savings with relatively little degradation in outage and throughput performance, especially with digital beamforming at the UE. In particular, we show that it is feasible to reduce power consumption by 75% and still achieve up to 95% (80%) of the maximum throughput using digital beamforming at 28 GHz (140 GHz), subject to an outage probability of at most 1%.


I. INTRODUCTION
A. Motivation M OBILE wireless communication in the mmWave and sub-THz bands enable multi-Gbps peak throughput, but at the cost of high power consumption in both the radio frequency front-end (RFFE) and digital baseband processing [2]. The high power consumption arises from the need to support a large number of antenna elements at very high sample rates, along with the relative inefficiency of RF components at high frequencies. Indeed, power consumption − particularly for mobile devices − is one of the most significant challenges facing 5G deployments today. For example, power estimates in [3] show that peak mobile RFFE power consumption for a typical 28 GHz device can exceed 1 W -a large portion of the total power budget. Recently, there has been significant interest in communication above 100 GHz, including the sub-THz and THz bands [4]- [7]. Power consumption issues are likely to become even more acute in these frequencies. For example, a recent power estimate [8] showed that the UE receiver for a New Radio (NR)-like system at 140 GHz would require more than 30 times the power consumption of a receiver at 28 GHz, based on current device performance. Discontinuous Reception (DRX) [9], [10], where a mobile device or UE temporarily disables its RFFE, is one of the most widely used tools to reduce power consumption in mobile devices. The DRX mechanism for the 5G new radio (NR) standards consists of three modes (states) [11] -Idle, Connected and Inactive -as opposed to legacy DRX with two states [12]. In this paper, we focus on connected mode DRX, whose operation is illustrated in Fig. 1.
Implementing DRX poses unique challenges in the mmWave and sub-THz bands [13], [15], [16]. Most importantly, mmWave systems communicate using narrow directional beams to overcome the high isotropic path loss [2]. Directional links need to be tracked to detect changes in the handset orientation, as well as link blockages -a key challenge in the mmWave bands [17], [18]. In addition, mobile devices in mmWave cellular systems will likely require maintaining links to multiple cells for macro-diversity [19]. Thus, UEs will likely need to track links from multiple directions from multiple cells during the Beam measurements and Feedback segments shown in the left panel of Fig. 1. This link tracking reduces the time a UE can turn off its RFFE, thereby creating trade-offs between power consumption, directional tracking and link reliability. For instance, if the UE decides to reduce link tracking to save power (right panel of Fig. 1), how should the UE track the links so that the UE performance (e.g., outage probability, throughput, etc.) does not suffer severely? As we will show below, the number of beams to track and the rate of blockages increases in the bands above 100 GHz making the tradeoff even more important in the sub-THz regime.  [13], [14]. Right Panel: Magnified time line of a TX beam sweep using SSBs at the gNB, with N SSB = 8. The UE is awake for K = 4 SSB time slots in each SSB burst period. If the UE employs digital beamforming, then it can track along all the RX directions in each SSB time slot that it is awake for. With analog beamforming, however, the UE can only track along a single RX direction in a given SSB time slot.
In this paper, we address this question using a two-step approach, where, (i) given a constraint, K, which is related to the number of links to track by a scaling factor (and hence, acts as a power constraint), we represent the choice of links to track over time as the outcome of a feasible policy for a multipleplay multi-armed bandit (MP-MAB) problem; and, (ii) given a policy for the MP-MAB problem in (i), we then identify the smallest K as the solution to an optimization problem that captures the power-performance trade-off.
Our contributions in this paper are: 1) For connected mode DRX in 3GPP NR, we estimate the UE RFFE power consumption at carrier frequencies of 28 and 140 GHz with analog and digital beamforming architectures, assuming system parameters taken from the 3GPP standard. 1 We show that the directional link tracking measurements are responsible for most of the power consumed in connected mode DRX, especially when the UE tracks the maximum permitted number of links. 2) To reduce power consumption in connected mode DRX, we choose to track only a subset of the links (depending on K) at any time. We then cast the choice of links to track as the outcome of a policy for a MP-MAB problem. Then, for a given a policy, we identify the smallest K as the solution to an optimization problem that captures the power-performance trade-off. 3) Since it is hard to obtain a statistical characterization of the effects of multipath, mobility and correlated blockages on the link SNRs, it is difficult to formulate a link tracking policy with provable performance guarantees (e.g. sub-linear regret). Hence, we consider four sub-optimal policies and compare their performance using detailed system-level simulations based on the 3GPP NR standards at 28 and 140 GHz, which are representative of a 5G mmWave [20] and a hypothetical 6G sub-THz [6] operating environment, respectively, for both analog and digital beamforming at the UE. While the question of a suitable UE beamforming architecture depends on many factors, such as spectral efficiency, hardware complexity, etc., our focus in this paper is restricted to the RFFE power consumption and link tracking capabilities of each architecture. For each combination of carrier frequency and beamforming architecture, the winning policy (i.e., the best performing policy among our limited selection in our simulations) provides useful performance benchmarks that a welldesigned policy can be expected to satisfy. 4) From the winning policies in our simulations, we conclude that a well-designed link tracking policy can realize at least 50% power savings and achieve 85% of the maximum throughput with at most 1% outage probability in a 5G mmWave environment at 28 GHz with analog beamforming at the UE. With digital beamforming in the same environment, more power savings (at least 75%) and a larger fraction (95%) of the maximum throughput can be achieved for the same constraint on the outage probability. At 140 GHz with analog beamforming, none of our policies achieves an outage probability below 1%, as the UE is constrained to track only a small fraction of the total number of links, even at maximum power consumption. This finding highlights the need for standardization efforts at 140 GHz for analog beamforming to be viable, both in terms of power efficiency and performance. On the other hand, digital beamforming in the same environment can save 75% power, while achieving 80% of the maximum throughput for an outage probability of at most 1%.

B. Related Work
DRX for LTE systems was studied in [21], while [13], [15], [16] focused on directional DRX for mmWave systems. These works concentrate on optimizing DRX parameters in the MAC/Radio Resource Control (RRC) layer (e.g., ON Timer, DRX cycle, etc.), whereas in this paper, we focus on the beam management aspect of DRX, which is primarily a physical (PHY) layer issue. Furthermore, [13], [15], [16] also focus on DRX performance during the data arrival phase (i.e., when there are data packets for the UE on the downlink), through metrics like the queuing delay, and the wake-up latency. In contrast, we focus on the pre-data arrival phase (i.e., no data arriving on the downlink), where we are interested in the outage probability, since an outage would lead to a loss of connectivity to the network.
DRX in a multi-connectivity setting is addressed in [22], but for the relatively simple scenario of dual connectivity in LTE systems, where the links are not prone to blockinginduced outages. However, at mmWave and sub-THz frequencies, where blockages are a major impediment, the degree of multi-connectivity is an important system parameter that impacts both the UE performance as well as power consumption, especially at sub-THz frequencies where the frequency and severity of blockages are likely to be greater than that at mmWave. To the best of our knowledge, this powerperformance trade-off and its implications for sub-THz UE beamforming architecture, which is a key theme of this paper, has not been studied previously.
Beam management has been extensively studied for mmWave systems in [23]- [26]. However, these works focus on beam tracking during data transfer and do not consider the power efficiency of link tracking mechanisms. In terms of scope, the work closest to ours is [14], where connected mode DRX is investigated in the pre-data arrival phase, but in a very simple mobile mmWave environment with only analog beamforming deployed at the UE. Crucially, it does not address the power versus performance trade-off associated with tracking multiple links at the UE. In this paper, we investigate this trade-off using detailed system-level simulations based on the 3GPP NR standards, for a 5G mmWave system at 28 GHz and a hypothetical 6G sub-THz system at 140 GHz, with both analog and digital beamforming at the UE. To the best of our knowledge, this is the first work to investigate the power efficiency of beam management, especially for sub-THz systems.

C. Organization
This paper consists of seven sections. In Section II, we provide an overview of connected mode DRX, along with a model for DRX power consumption. In Section III, we capture the trade-off between power consumption and the measured channel quality in connected mode DRX by formulating the choice of links to track as the outcome of a policy for a MP-MAB problem. In Section IV, we present four sub-optimal, but effective, policies for the MP-MAB problem in Section III. The details of our simulation setup, modeling a 5G mmWave system at 28 GHz and a hypothetical 6G sub-THz system at 140 GHz, are presented in Section V. Simulation results capturing the power-performance trade-off for the policies in Section IV are presented in Section VI, culminating in a discussion on the signficance of our results. Finally, Section VII concludes the paper with a summary.

D. Notation
Throughout this paper, vectors and matrices are represented by lower and upper case boldface letters, respectively. All vectors are column vectors, 1 denotes the all-one vector, and (.) T and (.) H denote the transpose and Hermitian operators, respectively. R and C respectively denote the set of real and complex numbers, P(.) denotes probability, E[.] the expectation operator, U[a, b] a uniform random variable over [a, b], and N (μ, σ 2 ) a normal random variable with mean μ and variance σ 2 . Gamma(a, b) and Beta(a, b) respectively denote a gamma and a beta random variable, whose probability distribution functions (pdfs) f Γ (.) and f Beta (.) are given by: where Γ(.) and B(., .) denote the gamma and beta functions, respectively. For integers a 0 and b 0 (b 0 = 0), a 0 /b 0 and (a 0 mod b 0 ) denote the quotient and remainder of a 0 /b 0 , respectively.

II. ANALYSIS OF 3GPP CONNECTED MODE DRX
We first present an overview of connected mode DRX in the 3GPP NR standard [11]. Consider a UE situated within the coverage area of N cell gNBs (base stations). Let N TX and N RX denote the number of antenna elements at the gNB and the UE, respectively. A key aspect of mmWave and sub-THz communications is the use of beamforming. We will assume that the gNB and the UE transmit and receive using finite beamforming codebooks [27], [28] for channel tracking and synchronization. Without loss of generality, we assume that the codebook size at the gNB and the UE equals N TX and N RX , respectively, which correspond to one codeword for each orthogonal spatial degree of freedom. 2

1) Beam Management in Connected
Mode DRX: DRX is a MAC layer procedure, which determines when the UE goes to sleep (i.e., turns off its RFFE) to save power. In connected mode DRX, the UE wakes up periodically for the following tasks: (a) Beam management (represented by the Beam Measurements and Feedback segments in the left panel of Fig. 1), which determines the link(s) through which the UE maintains connectivity to the network, and (b) to receive notifications through at least one link in (a) of data arrival on the downlink (represented by the ON Duration segment in Fig. 1). In this paper, we focus on (a) alone, drawing attention to the power intensive nature of beam management process, which undermines the effectiveness of DRX as a power-saving mechanism if the UE tracks all the permitted links. 3 To reduce power consumption due to beam management, we consider policies that identify a subset of 'good' links to track, without jeopardizing the UE's ability to receive any incoming data in step (b) above. To study the evolution of our policies with changing link conditions, we assume that the UE is in the pre-data arrival phase throughout for simplicity (i.e., there is no data arriving on the downlink), so that the UE continues to remain in connected mode DRX and beam management  I   ESTIMATED AWAKE TIME AND POWER CONSUMPTION IN CONNECTED MODE DRX FOR THE PARAMETERS DEFINED IN [33, TABLE IX] happens periodically without being interrupted by the UE going into Active state. 4 2) Directional Tracking Using Synchronization Signals: During beam management, we assume that the UE tracks the directional channel quality from the cells via the 5G NR synchronization signal blocks (SSBs) [11]. In the 5G NR system, each gNB periodically transmits a sequence of SSBs (known as an SSB burst) that sweep a set of TX directions [29], as illustrated in the right panel of Fig. 1. Let T SSB denote the duration of each SSB, T SS the SSB burst period, and N SSB the number of different TX directions swept in each SSB burst period, 5 which depends on the TX codebook as shown in the right panel of Fig. 1. To save power, we assume in Section III that in each SSB burst period, the UE chooses to track K ≤ N SSB SSB time slots (see right panel of Fig. 1, where K = 4 and N SSB = 8). During other SSB time slots, the UE can go to sleep and save power by switching off its RFFE.
3) Network Model With Carrier Aggregation: Resilience to blockage at mmWave frequencies necessitates macro-diversity, i.e., the UE must be connected to multiple cells [19], [30]. To this end, we assume that the UE is connected to all N cell gNBs via carrier aggregation, a key feature in 3GPP systems that enables simultaneous connections to multiple cells [31]. The cells may operate in either different component carriers or within the same component carrier 6 -the analysis for this paper is identical. We also assume that the cells are synchronized so that the SSB time slots from different cells are aligned.

B. Power Consumption in Connected Mode DRX
The UE power consumption in connected mode DRX depends on the choice of beamforming architecture (analog or digital). At a carrier frequency f c with beamforming scheme BF ∈ {Analog, Digital}, a UE in connected mode 4 Even with data traffic, the UE still needs to periodically track links [29]. The instances during which link tracking occurs may change with traffic patterns, but the proposed smart policies will still enable the UE to track links in a more power-efficient fashion whenever link tracking takes place. Hence, we believe that our results are applicable even in the presence of data. 5 In general, it is not necessary for the gNB to sweep through all the TX directions in its codebook in every SSB burst period, and thus, N SSB ≤ N TX . 6 According to [32], the UE can track up to 21 inter and intra-carrier frequency cells.
DRX in the pre-data arrival phase will need to periodically wake up for the following three events: (1) In Table I, we present estimates of P fc,BF BM , P fc,BF BR and P fc,BF LS , based on [8]. For the expressions of each of these quantities, we refer the reader to to [33,Appendix], which is an unabridged version of this paper. Based on Table I, we make the following remarks: Remark 1 (Digital beamforming with low-resolution ADCs): Conventionally, digital beamforming is believed to be more power hungry than analog beamforming due to the presence of multiple RF chains. However, a major source of the increased power consumption is the high resolution of the ADCs [34]. Recent works [3], [8]  Remarks 2 and 3 motivate the need for the UE to reduce the number of links to track in connected mode DRX in order to save power and thereby, preserve the effectiveness of DRX as a power saving mechanism. Thus, in the next section, we restrict the number of links that can be tracked at any time and then represent the choice of links to track over time as the outcome of a feasible policy for a MP-MAB problem.

III. PROBLEM FORMULATION
We index the SSB burst periods by t = 0, 1, . . ., and let γ ilk (t) denote the measured channel quality (i.e., SNR) from cell i = 1, . . . N cell , in TX direction l = 1, . . . N TX , and RX direction k = 1 . . . N RX . γ ilk depends on the UE motion, blocking, small-scale fading, and other channel characteristics. Henceforth, we refer to the triplet (i, l, k) as a link.
Let A(t), which we refer to as the tracking set, denote the set of links over which the UE chooses to measure γ ilk (t) in the t-th SB burst period. In general, A(t) can be viewed as the outcome of a policy, Π(.), in the following manner: where, at each SSB burst period t, the choice of links to track is a function of past decisions. The links in A(t) depend on: (a) the SSB time slots the UE is awake for in the t-th SSB burst period (which fixes the TX directions at the gNBs), and (b) the RX directions along which the UE measures, which is a function of the beamforming architecture. For instance, with a single RF chain and analog beamforming, the UE can only measure the channel quality along a single RX direction in each SSB time slot (i.e., TX direction) that it is awake for in the t-th SSB burst period. With carrier aggregation, the UE can thus track each of the N cell links corresponding to the TX-RX direction pair associated with an awake SSB time slot. On the other hand, with N RX RF chains and fully digital beamforming, the UE can measure the channel quality along all N RX RX directions in each SSB time slot that it is awake for in the t-th SSB burst period. Therefore, with carrier aggregation, the UE can track each of the N cell N RX links associated with an awake SSB time slot.
To reduce power consumption, we limit the number of awake SSB time slots in an SSB burst period to K (1 ≤ K ≤ N SSB ). Thus, the number of links that a UE can track in each SSB burst period, which is denoted by L, depends on K in the following manner: Eqn.
(3) captures the trade-off between power consumption and performance: the size of the tracking set (L) increases if the UE is awake for longer (i.e., as K increases), which increases both the power consumption as well as the UE's probability of tracking the link with the highest SNR. Remark 4 (Maximum Number of Links a UE is Permitted to Track): Let L max denote the maximum value of L, (3). L max is the maximum number of links that a UE is permitted to track in each SSB burst period, which is distinct from the number of available links (N cell N TX N RX ). In general, L max ≤ N cell N TX N RX , since it is not necessary for the gNBs to sweep through all N TX TX directions in each SSB burst period (i.e., N SSB ≤ N TX ). In addition, L max reduces by a factor of N RX for analog beamforming, since the UE is constrained to track links along a single RX direction in each SSB burst period.
Given K, we choose A(t) to maximize the measured SNR. Intuitively, this involves tracking the L strongest links over time, which can be modeled as a multiple-play multiarmed bandit (MP-MAB) problem. In our context, the arms correspond to links and at t, a policy Π chooses L links (multiple plays) out of a total of N cell N TX N RX . Let γ Π max (t) denote the reward of Π, which is given by The aim of policy is to minimize the rate of growth of the cumulative regret, R Π (t), which is given by: In choosing A(t), a policy needs to trade-off between two competing requirements: exploration and exploitation. The latter, where the UE tracks the L links with the highest (measured) average SNR, helps to minimize the rate of growth of R Π (.) in (5). However, the SNR statistics of the links can change with time; hence, the UE needs to track sub-optimal links from time to time (i.e, exploration) to adapt to any changes to the set of the strongest L links. From (5), we can see that formulating a policy with provable performance guarantees (e.g. sub-linear growth of R Π (T )) for mmWave and sub-THz systems depends on knowing how the joint statistics of the link SNRs (i.e., {γ ilk (t), ∀ i, l, k}) evolve with time. However, in a multipath environment with mobility, the channel dynamics of a collection of directional links is hard to characterize due to the intermittent nature of links, as well as correlation across links. Hence, in order to provide useful insights for mmWave and sub-THz systems, such as (a) the fraction of links (i.e., K/N SSB ) that need to be tracked for at most 1% outage probability, and (b) the throughput degradation (in case of data arrival) due to tracking only a limited number of links, we consider four policies and evaluate their performance in terms of these criteria using detailed system-level simulations at 28 GHz and 140 GHz, representative of a 5G mmWave and a hypothetical 6G sub-THz operating environment, respectively. The policies are (i) the -greedy algorithm [35], (ii) Thompson Sampling [36], (iii) the Upper confidence bound (UCB) algorithm [37], and (iv) the (L − L m ) round-robin policy. The first three are well-known policies for solving MAB problems, but typically assume stationary rewards (i.e., stationary SNR distribution over each link) and independence across arms (links). Hence, these policies are sub-optimal in our case. The fourth policy is a heuristic adapted from our earlier work [1].
Despite their sub-optimality, comparing the performance of multiple policies and identifying a winner among them based on their relative performance enables us to draw useful conclusions like "a well-designed link tracking policy should simultaneously realize 75% power savings, 95% of the maximum throughput, and < 1% outage probability in a 5G mmWave environment at 28 GHz with digital beamforming at the UE", because the winning policy meets these targets in our realistic system-level simulations. From an engineering perspective, we believe that conclusions like these are valuable, despite lacking in rigor, and are the key results of this paper. In contrast, we stress that conclusions like "the -greedy policy performs best at 28 GHz with analog beamforming at the UE", because it happens to be the winning policy in our simulations, are of little significance value and we refrain from making them.
We review the policies in the next section. Due to their sub-optimality, we do not delve into a detailed mathematical treatment of the policies; for these, we direct the reader to the references provided. However, we attempt to provide enough intuition behind their working, specifically on their exploration and exploitation mechanisms.

IV. LINK TRACKING POLICIES FOR CONNECTED MODE DRX
Before reviewing the policies, we first define some common quantities. Let G n (t) := {γ n (s) : n ∈ A[s], s ≤ t} denote the set of SNR values measured for the n-th link (n = 1, · · · , N cell N TX N RX ) up to the t-th SSB burst period. Hence,N n (t) := |G n (t)| denotes the number of times the n-th link has been included in the tracking set up to the t-th SSB burst period. Finally, letγ n (t) denote the mean measured SNR of the n-th link up to the t-th SSB burst period (i.e., the sample mean of the elements of G n (t)).

A. Upper Confidence Bound (UCB) Algorithm
We adapt the policy presented in [37] for multiple play MAB problems. The UCB link tracking policy is detailed in Algorithm 1. In the initialization phase, the policy measures the SNR of every link once, before deciding which links to track over time. During the operation stage, (6) determines the SSB burst periods when exploration and exploitation take place. After convergence, the first term in (6) helps in continuing to track links with high average SNR (largē γ n (t)), which constitutes exploitation. However, as t increases in (6), the second term eventually becomes large enough for an infrequently tracked (weak) link (i.e., smallN n (t)) to be included in the tracking set, which constitutes exploration.

B. -Greedy Algorithm
The -greedy link tracking policy is detailed in Algorithm 2. Unlike the UCB policy, it uses only the average link SNRs, γ n (t), to determine the tracking set. The exploration and exploitation phases of this policy is determined by a parameter ∈ (0, 1). At each t, the policy independently explores with probability and exploits with probability 1 − .

Algorithm 1 UCB Link Tracking Policy
Data: γ ilk (t) : i = 1, · · · , N cell ; l = 1, · · · N TX ; k = 1, · · · , N RX ; t = 1, · · · , T traj } Input : K Initialization {All the links are measured once initially, L at a time.} for t = 1 to N cell N TX N RX /L do • Choose L links that have not been measured • Updateγ n (t) andN n (t) accordingly end • Form A(t) with the links corresponding to the L largest values in (6) {The first term is (6) helps track strong links often (i.e., exploitation). As t increases, the second term eventually becomes large enough for an infrequently tracked (weak) link (i.e., smallN n (t)) to be included in During exploration, the policy uniformly and independently chooses L links to track, while in the exploitation phase, it tracks the L links with the highestγ n (t). Initially,γ n (0) = γ init ∀ n, a suitably large SNR 7 that ensures that the policy tracks each link at least once before converging. The choice of is discussed in Section VI.

C. Thompson Sampling
In contrast to the previous policies, Thompson sampling adopts a Bayesian approach, where the average SNR of the n-th link is assumed to be a random variable, X n . Assuming independence across links, 8 let f (X n |G n (t)) denote the posterior probability density function (pdf) of X n , given the measured values of γ n up to the t-th SSB burst period. Starting from a prior pdf ζ n , f (X n |G n (t)) is updated every time the n-th link is included in the tracking set (based onγ n (t) and N n (t)), using Bayes' rule. Thus, the policy aims to track the links with the L largest values of the posterior mean E[X n |G n (t)].
The Thompson sampling based link tracking policy, adapted from [36], 9 is presented in Algorithm 3. The performance of this policy depends on the choice of ζ n and the resulting 7 We assume γ init = 19.6 dB, which corresponds to the minimum SNR needed to decode the highest modulation and coding scheme (MCS) level in 3GPP, according to (14). 8 The SNRs of two or more links are, in general, not independent, due to correlated blocking. 9 To the best of our knowledge, there exists no formal Thompson sampling policy for MP-MAB problems; hence, we adapt the conventional single play Thompson sampling policy for multiple plays.

Algorithm 3 Thompson Sampling Based Link Tracking Policy
Data: γ ilk (t) : i = 1, · · · , N cell ; l = 1, · · · N TX ; k = 1, · · · , N RX ; t = 1, · · · , T traj } Input : K, Prior pdf ζ n ∀n {ζ n is the initially assumed distribution ofγ n (t)} by selecting the links corresponding to the L largest values among {x n : n = 1, · · · , N cell N TX N RX }. Operation rule [36] {This updates the posterior distribution ofγ n (t) based on the measured γ n (t) in A(t − 1). See [33], [36] for update rules for specific choices of ζ n ).} • Sample x n ∼ f (X n |G n (t)), ∀n • Form A(t) by selecting the links corresponding to the L largest values among {x n : n = 1, · · · , N cell N TX N RX }. {For a weak link n, a large sample value x n would see it included in A(t) and thereby, constitute exploration.} • Go back to Operation f (X n |G n (t)). To evaluate the latter in a tractable manner, conjugate priors 10 are commonly assumed for ζ n . The choice of distributions is discussed in Section VI-A.

D. (L − L m ) Round Robin Policy
Finally, we consider a policy with one-step memory (i.e., s = t − 1 in (2)), which is a modified version of the 10 For the likelihood function, f (Gn|Xn), δn is a conjugate prior if the posterior pdf, f (Xn|Gn), belongs to same family of distributions as δn.
Algorithm 4 (L − L m ) Round Robin Link Tracking Policy Data: γ ilk (t) : i = 1, · · · , N cell ; l = 1, · · · N TX ; k = 1, · · · , N RX ; t = 1, · · · , T traj } Input : K, L m Initialization • At t = 1, construct A(t) by uniformly and independently choosing L m links to track • Go back to Operation policy presented in [1]. For the current SSB burst period t, the policy retains the L m (1 ≤ L m < L) strongest links from A(t − 1) based on the instantaneous SNR γ n (t) (exploitation) and selects the other L−L m links independently and uniformly from the remaining links (exploration). We refer to this as (L− L m ) round robin policy, which is summarized in Algorithm 4. The choice of L m is discussed in Section VI-A.

V. SIMULATION SETUP
Comprehensive link-level simulations are used to generate the channel trajectories at 28 GHz and 140 GHz. Nine gNBs are deployed (N cell = 9) in a 400 m × 400 m area and the cell radius, r, of each gNB is 100 m. The gNB and the UE heights are set to 10 m and 1.7 m, respectively, in accordance with the 3GPP 'UMi' (urban microcell) specification [38]. At the start of each channel trajectory, the UE's location is chosen according to a two-dimensional uniform distribution that covers the grid.

A. Array Sizes and Beamforming Codebook
Based on [33, Table IX], we consider a 4×2 (8×8) uniform planar array (UPA) with λ/2 antenna spacing at the UE and an 8 × 8 (16 × 16) UPA at the gNB for f c = 28 (140) GHz. We assume two identical antenna arrays at the UE and gNB for full 360 degree coverage like practical devices [39] (i.e., one array covering the front hemisphere and the other the rear). Let F l := {f (1) l , f (2) l } (W k := {w (1) k , w (2) k }) denote the pair of gNB (UE) beamforming vectors corresponding to the l-th (k-th) TX (RX) direction, where f k ∈ C NRX ), correspond to the front and rear antenna  [38], ‡: [8]). arrays, respectively. We consider a simple beamforming codebook based on the steering vector of a UPA, such that the main lobes of the beam patterns cover the hemisphere, equally spaced in both azimuth and elevation. We refer the reader to [33] for the expressions of f (1) l , f (2) l , w (1) k and w (2) k .

B. Blockage Modeling
Blockers are placed using Poisson point process (PPP) [40] with blocker density λ b = 0.01 m −2 . The blockers can be human or vehicular with equal probability. The dimensions and velocities of the blockers are chosen according to 3GPP modeling specifications [38] i.e., the height and width for a human (vehicular) blocker are 1. We choose the Double Knife Edge Diffraction (DKED) model 11 for calculating the blockage loss, since to the best of our knowledge, there is no parametric blockage model for 140 GHz. On the other hand, the DKED model is physicsbased and holds for all frequencies. Moreover, measurement results in [42] show that the DKED model is within a few dB of the blockage loss at mmWave frequencies. Hence, we believe that the DKED model would provide a reasonably accurate estimate of the blockage loss at 140 GHz as well.

C. Non Line-of-Sight Paths
Stationary reflectors with density λ r = 0.01 m −2 are also deployed according to a PPP. We assume that the reflectors do not cause blockage. We form clusters of reflectors based on their proximity [38] and select up to N Cluster clusters in increasing order of the gNB → cluster → UE path length [43], in addition to the line-of-sight path. The reflection loss suffered by the signal is taken to be 7 (10) dB at 28 (140) GHz [44]. 12

D. Mobility
The reflectors are assumed to be static. The UE and blockers follow the random waypoint mobility model [45]. Under this model, let x t ∈ R 2 represent the position of an object during 11 DKED is also known as Blockage Model B in 3GPP specifications [38] and is used in the METIS project [41]. 12 In general, the reflection loss depends on the material and the angle of incidence. For the sake of simplicity, we do not consider these effects. the t-th SSB burst period. At the next SSB burst period, the position is updated as follows: whereẋ is the velocity of the object. The x and y components of the velocity vector are independently chosen according to the distributions in Table II. The UE and the human blockers have similar mobility characteristics, since their velocities are extracted from the same distribution; vehicular blockers have a different velocity distribution. A destination is associated with every mobile object at the start of each simulation, which is changed when the object in question reaches the destination. The destination of the objects are restricted within the simulation grid.

E. SNR Calculation
Let H i (t) ∈ C NRX×NTX denote the channel gain matrix between the i-th gNB and the UE at the t-th SSB burst period, due to N Cluster multipath components (MPCs). For the p-th MPC, let , α i,p (t), ρ i,p (t) and d i,p (t) denote the angle of arrival, angle of departure, blockage loss, reflection loss and path length, respectively. Then, H i (t) can be expressed as follows: where u(.) and v(.) denote the UPA steering vectors at the UE and gNB, respectively [33]. ẋ UE denotes the UE speed, f c the carrier frequency and c the speed of light. The expression for γ ilk (t) is as follows: where k B is Boltzmann's constant, B denotes the system bandwidth, N F the noise figure, and T 0 the temperature.  We generate a total of 100 channel trajectories. For each trajectory, we simulate over 3000 SSB burst periods, with T SS = 20ms, amounting to a runtime of 60 s per trajectory (T traj ). The list of parameter values used to generate the simulation data is presented in Table II. An inherent weakness of our simulation-based approach is the fact that the results in the following section depend on the values in Table II, and could change for a different set of parameter values. To mitigate this, we have chosen values that are consistent with the 3GPP standards and/or backed by empirical evidence. While this does not guarantee that the parameter values will remain fixed forever, we believe that it provides a reasonable safeguard against large changes, which in turn, lends weight to the conclusions that we draw from our simulation results.

VI. RESULTS AND DISCUSSION
At mmWave and sub-THz frequencies, the link SNRs in an urban microcell environment are expected to be nonstationary due to UE mobility and the blockage losses caused by mobile blockers. To get an insight into the timescale of SNR variation, we first simulated a single gNB-UE link under a horn antenna configuration at both end points, wherein the UE's starting location was kept the same for all trajectories and its movement confined along the line connecting it to the gNB by a one-dimensional version of the mobility model in (7). This restriction ensures the same link (i.e., the same beam directions at the UE and the gNB) over all trajectories at all times. For this setup, the evolution of the average link SNR in increments of T SS (i.e., the duration between successive SSB burst periods) is plotted in Fig. 2. The fairly large fluctuations in the average SNR over relatively short time scales confirms the non-stationarity of the operating environment, and reinforces the difficulty of formulating a link tracking policy with provable performance guarantees.
In the pre-data arrival phase, the outage probability is an important performance metric that affects both the throughput and the latency, 13 in case there is data arrival on the downlink. The outage probability for a policy Π, denoted by P Π out , is defined as follows: where γ tgt is the minimum SNR required to communicate using MCS 0. The value of γ tgt can be obtained from (14) for u = 0. In Sections VI-A and VI-B, we analyze the outage performance of the policies discussed in Section IV, for analog and digital beamforming at the UE, at 28 and 140 GHz. The purpose of this analysis is to quantify the amount of power the UE can potentially save, subject to satisfying an outage probability of at most 1%.

A. Parameter Tuning for the -Greedy, Thompson Sampling and (L − L m ) Round Robin policies
We begin by tuning the parameters of the -greedy, Thompson sampling and L − L m round robin policies for each of the four scenarios to find the parameter values that yield the best results for our simulation setting. The tuned parameter is the one that results in the smallest sum P Π out over K in our simulations.
The values of , denoted by tuned , over the interval [0.001, 0.5] 14 are listed in Table III for the four scenarios. Similarly, we test different priors for Thompson-sampling [36] to find out the tuned priors ζ tuned given in Table III. For L − L m RR, over 1 ≤ L m < L, we observed that increasing L m leads to lower P Π out for a given K, but we did not observe any significant improvement for L m > 3L 4 . Hence, we consider L m = 3L/4 for each case from here on. For a detailed discussion on parameter tuning for each of the policies, see [33].

B. Outage Performance of Policies
We proceed to compare the outage performance of the tuned policies in connected mode DRX against a "Genie" benchmark, which represents the ideal case where the UE 13 A UE in outage needs to trigger initial access procedures to re-establish connectivity to the network, which incurs some delay and adds to the overall latency.
14 Over the 3000 SSB burst periods in each simulated trajectory, the probability that the -greedy policy explores at least is 1 − (1 − ) 3000 ≈ 0.95 for = 0.001. Hence, ≥ 0.001 provides a high probability for thegreedy policy to explore at least once after convergence in our simulations. We restrict the upper limit of to 0.5, since we wish our policy to exploit what it has learnt at least half the time. knows (and thus, tracks) the link with the highest SNR at each SSB burst period. Let γ g (t) denote the genie SNR during the t-th SSB burst, which is given by Similar to (10), the genie outage probability, denoted by P g out , has the following expression: At 28 GHz with analog beamforming, we see from Fig. 3a that K = 8, which translates to the UE being awake for only 12.5% of the time during an SSB burst period, results in an outage probability of less than 1% for all policies. For digital beamforming, the power savings are greater as K = 4 (i.e., 6.25% awake time) is sufficient to achieve the same outage performance for all the policies (see Fig. 3b).
At 140 GHz, we see from Fig. 4 that the performance gap between analog and digital beamforming is very large, compared to 28 GHz. In particular, for analog beamforming with K = N SSB = 128 (i.e., the UE is awake for the whole SSB burst period), the outage probability is still more than 1% for all the link tracking policies (Fig. 4a). On the other hand, for digital beamforming, K = 16 (i.e., the UE can sleep for 87.5% of the time during an SSB burst period) results in an outage probability of less than 1% for all policies (see Fig. 4b). We make the following remarks on the poor performance of the policies for the analog beamforming case at 140 GHz.
The value of κ for the four cases is listed in  The winning policy in our simulations for each case, defined as the one with smallest sum P Π out across K, is reported in Table V, and is used to analyze the power versus throughput trade-off in the next subsection.

C. Power Vs. Throughput Trade-off
In this section, we formulate an optimization problem that captures the trade-off between the UE power consumption and throughput. We use the spectral efficiency η, of the MCS levels to map γ Π max to throughput 15 from [48, The loss factor Δ, is a measure of how far the system is operating from Shannon capacity. The value of Δ = 2 (3 dB) is in accordance with [3], [43], [49]. We define p uK as the probability that MCS u is supported when the UE is awake for K SSBs i.e., From p uK , the expected spectral efficiency E[η uK ] for MCS u and awake time K can be expressed as: We can now capture the power-throughput trade-off using the following optimization problem: In (17), δ > 0 is a tunable parameter that can be used to penalize power consumption, e.g., for a UE operating in lowpower mode, a large δ is appropriate, whereas for a fully charged device anticipating high throughput traffic, a low δ may be suitable. The constraint (18) ensures that the optimal solution supports the chosen MCS with a minimum probability of P o . 15 Since the UE is in the pre-data arrival phase, this is, strictly speaking, the anticipated throughput based on γ Π max , in case there is data arrival. For convenience, we continue to refer to it as throughput in the paper. Pareto boundaries for analog and digital beamforming at 28 and 140 GHz. We see that the 28 GHz system achieves a peak spectral efficiency of 4.2 bps/Hz, which reduces to 2.8 bps/Hz for 140 GHz. This is due to the higher blockage and reflection loss at sub-THz frequencies.
The choice of δ determines the operating point on the Pareto boundary of the power-throughput trade-off curve, which is shown for analog and digital beamforming at 28 and 140 GHz in Fig. 5 for P o = 99%. The curve corresponding to analog beamforming at 140 GHz is missing in Fig. 5b, since the optimization problem in (17)-(18) is infeasible for this case for P o = 99%, as seen in Fig. 4. The trade-off at the feasible 'knee-points' are presented in Table VI.
Remark 7 (Implications of Table VI): From an engineering perspective, Table VI enables us to draw useful conclusions like "a well-designed link tracking policy should simultaneously realize 50% power savings, 85% of the maximum throughput, and < 1% outage probability in a 5G mmWave environment at 28 GHz with analog beamforming at the UE" (and so on, for the other cases), since these benchmarks are achieved by the winning policy among a small collection of sub-optimal policies. In our opinion, conclusions like these are the key contributions of this paper.

D. Closeness to Optimality and Convergence Time
While Fig. 5 provides an insight into the average throughput that can be achieved by the policies in Table V, it does not  TABLE VI   POWER-THROUGHPUT TRADE-OFF AT THE Table VI. indicate how closely and quickly their tracking performance approaches that of Genie. In Fig. 6, we plot the CDFs of γ Π max corresponding to the knee points labelled in Fig. 5, along with the Genie CDF. At 28 GHz, digital beamforming with K = 16 differs from Genie by only about 0.1 dB at 50-th percentile, while for analog beamforming with K = 32, the difference goes up to 2 dB. However, the power consumed by analog beamforming in this case exceeds that of digital beamforming by a factor of 4.5, as shown in Table VI. At 140 GHz, the difference at the 50-th percentile is around 2dB for digital beamforming (K = 32).
We define the convergence time of a policy Π as the number of SSB bursts required for the ratio E[γ Π g (t)/γ Π max (t)] to be less than 3 dB, where γ g (t) and γ Π max (t) are given by (11) and (4), respectively, and the averaging is across simulation trajectories. The convergence time in terms of the number of SSB burst periods, is listed in Table VI for the winning policies  from Table V. We see that convergence is faster with digital beamforming due to the larger fraction of links that it permits the UE to track for a given K (Remark 5).

E. Significance of Simulation Results
• In Section II, we demonstrated the power intensive nature of beam tracking within connected mode DRX -a mechanism intended to reduce UE power consumption -for mmWave and sub-THz systems at 28 GHz and 140 GHz, respectively (Table I and Remark 2). This is a critical issue from an energy efficiency perspective, which is a key performance metric for 5G and 6G systems [50], [51].
In this context, our results on the magnitude of the UE power savings that can be achieved (50% − 75%) without a large degradation in throughput (based on Table VI and  Remark 7) is significant, considering the scale of UEs (billions of devices) expected to be supported by a 5G/6G network. • Implications for Beamforming Architecture: From Table VI, we observe that digital beamforming architecture permits more power-efficient link tracking with lesser throughput degradation than analog beamforming, which is due to the former enabling the UE to track a larger fraction of the available links for any K, according to (3). This adds to the compelling case for digital beamforming at 28 GHz made in [3], and extends it to 140 GHz, provided the advantages of digital beamforming with low-resolution ADCs extend to 140 GHz as well (Remark 1). With respect to analog beamforming, the especially poor performance of the policies at 140 GHz is due to the extremely small fraction of the available links that the UE is permitted to track at each SSB burst period (Remark 5); hence, it is unlikely that a superior policy would perform any better. From Remark 6, we see that this is due to N SSB (the number of links that the UE is permitted to track) not increasing to an extent commensurate to the pathloss-driven scaling of N TX N RX (the beamforming gain) from 28GHz to 140 GHz. N SSB is determined by the 3GPP NR standards at 28 GHz, and we make reasonable assumptions for its value at 140 GHz based on the standards (see the Remarks column corresponding to N SSB in [33, Appendix, Table IX]) . Hence, for analog beamforming to be viable at 140 GHz in terms of power-efficiency and outage/throughput performance, significant standardization efforts are needed to identify the appropriate value(s) of N SSB for 140 GHz.
• Apart from the case of analog beamforming at 140 GHz, Table VI suggests that the power versus performance trade-off can be favorably navigated using the suboptimal policies considered in our paper. This is significant, since it suggests that considerable power savings can be realized even with unsophisticated policies. Thus, any of these policies could, in principle, be implemented for mmWave UEs with a short development cycle, leading to improved battery life on devices. However, a lot more work remains to be done to develop a policy that is well-suited for 5G mmWave and 6G THz systems with provable performance guarantees.

VII. SUMMARY
DRX is likely to be aggressively used in mmWave and sub-THz wireless systems due to the high UE RFFE power consumption, which mainly stems from the need to track multiple links to ensure reliable multi-connectivity in the presence of frequent and severe link blockages. In this paper, we focused on reducing the UE power consumption during connected mode DRX by tracking only a subset of the available links, but without adversely affecting the outage/throughput performance. To achieve this objective, we formulated the choice of links to track over time as the outcome of a feasible policy for a MP-MAB problem. Through detailed system level simulations at 28 and 140 GHz, modeling a 5G mmWave and a hypothetical 6G sub-THz system, respectively, we observed that even sub-optimal link tracking policies could achieve considerable power savings with relatively little degradation in outage and throughput performance, especially with digital beamforming at the UE.