Balancing Queueing and Retransmission: Latency-Optimal Massive MIMO Design

One fundamental challenge in 5G URLLC is how to optimize massive MIMO systems for achieving low latency and high reliability. A natural design choice to maximize reliability and minimize retransmission is to select the lowest allowed target error rate. However, the overall latency is the sum of queueing latency and retransmission latency, hence choosing the lowest target error rate does not always minimize the overall latency. In this paper, we minimize the overall latency by jointly designing the target error rate and transmission rate adaptation, which leads to a fundamental tradeoff point between queueing and retransmission latency. This design problem can be formulated as a Markov decision process, which is theoretically optimal, but its complexity is prohibitively high for real-system deployments. We managed to develop a low-complexity closed-form policy named Large-arraY Reliability and Rate Control (LYRRC), which is proven to be asymptotically latency-optimal as the number of antennas increases. In LYRRC, the transmission rate is twice of the arrival rate, and the target error rate is a function of the antenna number, arrival rate, and channel estimation error. With simulated and measured channels, our evaluations find LYRRC satisfies the latency and reliability requirements of URLLC in all the tested scenarios.


I. INTRODUCTION
Next-generation cellular systems, labeled as 5G, are targeting low latency and ultra-high reliability to support new forms of applications, e.g.mission critical communications.One of the key technologies for 5G will be massive MIMO, where the base-stations will be equipped with tens to hundreds of antennas [1]- [4].In this paper, we explore how to leverage the large number of spatial degrees of freedom to minimize latency while ensuring high reliability.
Current cellular system design follows a layered approach.The queueing latency 1 is managed at MAC and higher layers, while the target (block) error rate 2 is managed separately by the physical layer to maximize the physical layer throughput.For example, the transmission rate (usually referred to as modulation and coding scheme [5]) is often adapted to meet a fixed target error rate of around 10%.This decoupled design is Xu Du, Ashutosh Sabharwal are with the Department of Electrical and Computer Engineering, Rice University, Houston, TX, 77005 (emails: xdurice@gmail.com,ashu@rice.edu).Yin Sun is with the Department of Electrical and Computer Engineering, Auburn University (email: yzs0078@auburn.edu).Ness B. Shroff is with the Departments of ECE and CSE at The Ohio State University (email:shroff@ece.rice.edu).This work has been supported in part by National Science Foundation awards CCF-1813078, CNS-1518916, CNS-1314822, CNS-1618566, CNS-1719371, CNS-1409336, and from the Office of Naval Research award N00014-17-1-2417. 1 In this paper, we use queueing latency to represent the waiting time that packets spend in the MAC-layer queue.And overall latency denotes the total latency caused by retransmission and waiting at the MAC-layer queue. 2 In this paper, we use the target error rate when emphasizing the design of transmission control.And we use block error rate when emphasizing the probability of decoding error under a given transmission control.
shown to be nearly throughput optimal [6] for single-antenna systems.However, such a decoupled design may not achieve low latency.
As 5G pushes to low latency (10-100× lower than the LTE system [7]) and ultra-high reliability, it is of paramount importance to control the latency and service unreliability caused by retransmissions.The Ultra-Reliable Low-Latency Communication (URLLC) has a reliability requirement of 99.9999% [8], i.e., the probability of packet successful delivery within 4 round of transmissions (0.25 ms/5G frame) should be higher than 99.9999%.To satisfy such reliability requirement, the target error rate cannot exceed 3.16%.For a given set of possible target error rates, it might be natural to choose the lowest one, which leads to the highest link reliability and shortest retransmission latency.However, since the overall latency is the sum of latency due to queueing and due to retransmissions, a very small target error rate might result in long queueing latency and does not always minimize the overall latency.In this paper, we achieve reliability guaranteed latency minimization by finding the target error rate and the transmission rate adaptation that jointly minimize the overall latency.
While it is widely known that the target error rate reduces with a higher transmission power or a lower transmission rate, the relationship between the target error rate and overall latency is more complex.There is a tradeoff between retransmission latency and queueing latency, both of which are impacted by the target error rate: On the one hand, the retransmission latency reduces as the target error rate reduces.On the other hand, if the system is fixed to an extremely low target error rate, few packets can be transmitted in each frame, i.e., the transmission time to send the same amount of packets increases, and packets have to wait for a longer time in the queue.Therefore, under a given arrival process, the queueing latency increases as the target error rate reduces.The situation is further complicated by the fact that current mobile users adapt their transmission power, which makes the feasible (transmission rate, target error rate) tuple time-varying.Fig. 1 depicts an example of the minimum overall latency achieved at different target error rates where the transmission rate is optimized for given target error rate; the details on how to optimize the transmission rate will be discussed later in Section III.For the specific example in Fig. 1, a target error rate (1%) smaller than both the LTE target error rate (10%) and the URLLC reliability requirement (target error rate of 3.16%) results in the minimum overall latency.It demonstrates a need for finding an appropriate target error rate that minimizes the overall latency by balancing the queueing latency with the retransmission latency.Overall Latency, (ms)

5G Reliability Bound
Less Latency

More Reliability
Fig. 1: An example illustrating the overall latency for different target error rates, where the transmission rate has been optimized for each given target error rate.A massive MIMO uplink system with 4 single-antenna users and 32 base-station antennas is considered.The channel traces are measured in an over-the-air channel on the Rice Argos platform and the base-station estimates the channel based on 8 pilot symbols per user.Please find the evaluation details in Section VI.
In this paper, we model practical massive MIMO systems with retransmissions.To minimize the overall latency from both queueing and retransmission, we optimize the target error rate and transmission rate adaptation.The main contributions of this paper are the following: • We formulate a latency minimization problem for massive MIMO systems, in which the target error rate and transmission rate are jointly optimized for minimizing the overall latency, subject to the reliability constraint of URLLC.The arrival process is a discrete random process that is memoryless.This optimization problem is cast as a constrained Markov decision process and solved by value iteration.
• Because Markov decision process does not provide much insight on the optimal control, we develop a deterministic control policy for massive MIMO with a large number of antennas and a constant arrival rate.We note that there exists an important 5G URLLC type data traffic, e.g., time-sensitive and throughput-hungry virtual reality (VR) service [9], which has a constant data arrival rate.This deterministic control policy is named as Large-arraY Reliability and Rate Control (LYRRC), which has a low complexity and is in a closed form: If the packet arrival rate is λ, the transmission rate of LYRRC is 2λ.In addition, the target error rate of LYRRC is , where F η is the CDF of the effective channel gain (defined later), M is the number of base-station antennas, K is the number of users, ρ is the traffic arrival load over link capacity, p I is the power of the interference from neighboring cells, and τ is the number of pilots.LYRRC is proven to be asymptotically optimal as the number of antennas grows to infinity.Furthermore, the total latency achieved by LYRRC can be expressed as a closed-form function of the number of base-station antennas M , the number of pilots τ , the number of served users K, and ρ.In particular, for ρ ∈ [0, 1), we show that the average waiting time diminishes to zero as M increases to infinity.
• To verify LYRRC's performance in the real world, we measure massive MIMO channels on the 2.4 GHz with Rice Argos platform [2], which consists of a 64-antenna base-station and four mobile users.The numerical experiments based on the measured and simulated channels show that LYRRC with 5G self-contained frame [5], [10] can simultaneously meet the 1 ms latency and 99.9999% reliability criterion.In the same scenario, the best latency of transmission rate control policies with a fixed target error rate of 10% is more than 5 ms.The evaluations demonstrate that LYRRC can provide 400× latency reduction compared to current LTE transmission control, which has a target error rate of 10% and fixed per-frame transmission power control.Compared to the best queue-length based rate adaptation policy with a fixed target error rate of 10%, LYRRC achieves a 20× latency reduction.Related Work: The majority of the massive MIMO literature focuses on the achievable rate maximization, which assumes full-buffer and does not model the upper layer latency from queueing.Massive MIMO was shown to provide higher spectral efficiency [11], [12], wider coverage [11], [12] and easier network interference management [11], [13], [14] than traditional MIMO.This work differs from previous massive MIMO physical layer work in that we provide reliability guaranteed latency-optimal transmission control.Prior work also optimized the retransmission process, either for throughput [6] or energy efficiency [15] maximization.Additionally, crosslayer optimization [16]- [19] have been proposed for latency reduction.For a point-to-point system, past studies [20]- [23] showed that using the queue-length information for transmission rate control can reduce queueing latency.Finally, stochastic network calculus [24] is used to capture the latency violation probability of multi-input single-output systems with perfect rate adaptation.Thus, the perfect rate adaptation of past work implies no decoding error or retransmission latency.
The remainder of this paper is structured as follows.In Section II, we provide a physical layer abstraction and network model for a single user latency minimization problem.Section III provides an algorithm to solve the formulated latency minimization problem.A simple and yet latencyoptimal transmission control policy, LYRRC, is investigated in the large-array regime in Section IV.In Section V, we extend our single-user analytical results to multiuser massive MIMO systems.We provide numerical results in Section VI and conclude in Section VII.
Notations: We use boldface to denote vectors/matrices.We use | • | to denote the magnitude of a complex number.And the l 2 norm of a complex vector is • .The complex space is C. The space of real value is R whose positive half is denoted as R + .The following notations are used to compare two nonnegative real-valued sequences {a n }, {b n }:

A. System Model
We consider a massive MIMO uplink system.The singleuser case is considered first in Sections II-IV, and is depicted in Fig. 2. The extension to multi-user systems will be presented later in Section V.Each user is equipped with a single antenna and the base station has M antennas.Based on the physical layer procedures defined in the first 5G release [5], we consider that the system operates in self-contained frames, as shown in Fig. 3.A self-contained frame consists of both data transmission and an immediate ACK/NACK.Without loss of generality, the duration of each frame is of 1 unit and Frame t spans the time interval [t, t + 1) , t ≥ 0. In each frame, the user first transmits encoded data packets to the base-station.The base-station then feeds back an ACK or NACK to signal whether a decoding error occurred.The feedback is assumed to be error free.
1) Physical Layer Model: During the uplink data transmission, the received signal by the base-station over the wideband channel is where n is the subcarrier index, N is the total number of subcarriers, x n is the transmitted signal, z n ∈ C M is a zeromean circularly symmetric complex Gaussian noise vector, and 0 < γ ≤ 1 is the large-scale channel gain.We model the channel fading processes as block Rayleigh fading, where the small-scale fading vector h t,n maintain the same during each frame and varies independently across frames and subcarriers.
In this paper, we may omit the frame index t in h t,n when the frame index is clear from the context.During each frame, the user transmits τ uplink pilots, each with power p τ .Let ĥn be the estimated channel vector by the base-station via the MMSE estimator.The estimated channel satisfies that [11], [12] where e n ∈ C M is a zero-mean, circularly symmetric complex Gaussian noise vector with variance of 1 1+γpτ τ .After applying conjugate beamforming, the obtained signal is where the three terms on the right hand side represent the desired signal, signal loss from imperfect channel knowledge, and noise, respectively.The receive SINR on Subcarrier n is [14], [25] SINR n = γp where p = |x n | 2 is the power of uplink data transmission.The user is aware of the large-scale channel gain γ and the distribution of the small-scale channel fading via the estimation of a periodic indication signal broadcast by the base-station [5].During each frame, all uplink packets to be transmitted are encoded in a single code block that spans all N subcarriers.The block error rate of the uplink transmission is a function of the transmission power.A closed-form characterization of the block error rate appears to be intractable when the code-block length is finite [26].Hence, we employ the following block error rate approximation that was developed in [6], [26]- [29].Let L be the number of information bits in each packet, and r t is the number of transmitted packets in Frame t.We refer to r t as the transmission rate.The block error rate of a code block with a code-block length L code can be approximated as ≈Prob where ν is the channel dispersion [26], [28] due to finite block length and is upper bounded by log 2 (e).For a systems with strong channel coding, [26] shows that (5) closely captures the block error rate when L code > 100.The approximation in ( 6) is derived by considering sufficiently large code-block length [6], [27], [29] and high SINR regime [6], [27].Fig. 4 provides an illustration of the approximated block error rate in (6), in which an LDPC-based massive MIMO system is considered and the code-block length is chosen according to DVB-S.2 standard.Our simulations confirm the conclusions drawn from past works [6], [27], [29].We hence adopt 3 (6)  as the block error rate model.
2) Buffer Dynamics with Retransmission: We assume that there is no packet in the buffer at time 0. During each frame, λ new packets arrive in the queue 4 and each packet contains L-bits.In each frame, the user receives downlink ACK/NACK feedback from the base-station.Upon ACK, the transmitted packets are removed from the buffer.Upon NACK, the transmitted packets remain at the buffer queue head 5 .We use the indicator function 1 t to represent decoding success, 1 t = 1 means success and 1 t = 0 otherwise.The distribution of the 1 t is determined by the chosen target error rate as At time t, let q t be the queue-length of the buffer, and r t be the number of packets to be transmitted at Frame t as per the control decision.The queue-length evolves according to where B is the size of the buffer and r t is the number of transmitted packets in Frame t.If the buffer cannot store all the packets waiting to be transmitted, an overflow event occurs.
The number of dropped packets due to the buffer overflow is given by The average number of dropped packets due to overflow, measured in packets per frame, is λ drop = lim T →∞ T −1 t=0 b t /T .When packet overflow happens, the dropped packets induce significant latency to time-sensitive applications.We assume that each overflowed packet introduces a large latency penalty D drop .We are interested in minimizing the overall latency (from arrival to successfully delivery).We consider the stationary policies are complete, i.e., the minimum latency can be achieved by a stationary policy.Under a stationary policy, the queueing latency of successfully served packets are lim T →∞ which is derived by using Little's Law [30].To summarize, if a packet is dropped, its latency is D drop and if a packet is successfully served (not dropped), its latency is lim T →∞ where is the proportion of successfully served packets and q is the average queue-length, i.e., lim T →∞ T −1 t=0 qt T .

3) Transmission Power adaptation:
We consider the transmission power of the user to satisfy a long-term power constraint of P .In Frame t, the transmission power is adapted, based on the transmission rate r t , and the number of pilots τ , to achieve the target error rate .The transmission power is quantified in the sequel: Substituting ( 4) into ( 6), the block error rate is approximated as (10) where κ n is the the per-antenna gain of small-scale channel fading, given by The per-antenna gain κ n is the arithmetic mean of the smallscale channel gain across the M antennas because the received signals with different antennas are combined during the linear beamforming.The left-hand-side of the inequality of ( 10) is determined by the small-scale fading, and the right-hand-side of ( 10) is a constant independent of small-scale fading.For the ease of subsequent presentation, we define which is called effective channel gain.The effective channel gain ( 12) is the geometric mean across the N subcarriers because the maximum outage-free rate [26] can be approximated by the logarithmic of the product of the per-subcarrier Recall that the transmission power is adapted to achieve the target error rate, from (10), we have where is the inverse CDF of the effective channel gain η in (12).When τ increases, the base-station has a more accurate channel estimation and the needed transmission power (at the same rate with the same reliability) reduces.One can observe that the required transmission power increases with the transmission rate r and the packet size L, and decreases Block Error Rate Fig. 4: Block error rate of a coded system as a function of SINR mean with N = 1.In simulation, the channel gain follows the normal distribution with labeled variance.The approximations are obtained by (6).And the simulation is done with LDPC code [31] and sparse parity-check matrix comes from the DVB-S.2standard.The transmission is at a rate of 1.5 bits per symbol (8-QAM, 0.5 code rate).
with the number of base-station antennas M , the number of subcarriers N , and the number of pilots τ .

B. Single-user Latency Minimization Problem
We now formulate the single-user latency minimization problem.The objective of the joint target error rate and transmission rate control is to minimize the average packet latency under a long-term average power constraint.The system state is the queue-length q t , whose state space is Q = {0, 1, ..., B}.The transmission controller determines the number of transmitted packets r t at the beginning of each frame based on the queue-length q t , as well as the target error rate that remains constant in all frames over time.Recall that the the transmission rate is the number of transmitted packets r t .We consider the set of stationary policies such that r t = µ(q t ), where µ : Q → R + is a function.And the target error rate is chosen from a finite set E. Finally, the transmission power p t is adapted based on the designed rate r t , target error rate , and number of pilot τ as in (13).Both the transmission rate function µ and the resulting transmission power are independent of the exact small-scale fading h n as it is unknown to the user.
For any target error rate and transmission rate function µ, we assume that the resulted Markov chain of the system states is ergodic, i.e., the unichain condition is satisfied.The associated unique steady state of the system is denoted as π.
The latency minimization problem is formulated as: State Transition Model ( 4)-( 8), where max it the maximum allowed target error rate due to reliability requirement.For 5G URLLC, max = (1 − 99.9999%) 1/4 = 3.16%.The optimal objective value of ( 14) is denoted as D * , or D * (M ) when we need to emphasize the dependence on the number of antennas M .Hence, D * (M ) captures the minimum overall latency D * as a function of the number of base-station antennas M .

III. LATENCY-OPTIMAL SINGLE-USER TRANSMISSION CONTROL
In this section, we first formulate the latency minimization problem ( 14) as a constrained average cost Markov Decision Process (MDP) and solve it by a proposed algorithm.The proposed algorithm can also solve the latency-optimal control for general point-to-point MIMO systems by replacing the persubcarrier SINR in (4) with the SINR of the MIMO system.The effective channel gain in (12) and power mapping in (13) also should be modified accordingly.

A. Lagrange Duality of the MDP
For a target error rate ∈ E, and a stationary transmission rate adaptation Q → R + , based on the definition of average latency (9), we define the induced latency cost mapping d on each state action pair as where b is the number of the dropped packet due to buffer overflow as shown in (8).In Frame t, a latency cost and a transmission power cost are incurred.The average overall latency of the problem in infinite horizon equals Similarly, utilizing the transmission power characterization in (13), the average power is Given an average power constraint P , the objective of the joint target error rate selection and transmission rate control is restated as a constrained MDP as Minimize D π subject to P π ≤ P, ≤ max , State Transition Model ( 4)-( 8).( 15) The constrained MDP ( 15) is converted to an unconstrained MDP via Lagrange's relaxation as For ergodic MDP, [32], [33] provide a sufficient condition under which the unconstrained MDP is also optimal for the original constrained problem (14).For all policies such that P π = P , the sufficient condition provided by [32], [33] is satisfied.Thus, when the constraint is binding, there exists zero-duality gap between original problem ( 14) and the unconstrained MDP (16), i.e., their optimal solution is the same.
We now present the algorithm to solve (16) in Section III-B.The closed-form solution of ( 16) and the characterization of the array-latency tradeoff D * (M ) are presented in Section IV.

B. A Value Iteration Based Algorithm
Problem ( 16) is an MDP with an average cost criterion in infinite horizon.To find the optimal target error rate, we need to find the optimal transmission rate adaptation and the corresponding achievable latency for each ∈ E that is smaller than max .Furthermore, for each target error rate , we can use binary search method to find the smallest β that satisfies the long-term power constraint P in (16).Such β corresponds to the latency-optimal solution for (15) because that, for each , the average power is monotonically non-decreasing on β > 0. Finally, for each and β, we thus find the optimal transmission rate adaptation µ * by considering α-discounted problem [34] of (16).We now present a solution to each of the discounted problem.For each system state q, define value cost function as where α ∈ (0, 1) is the discount factor.For each and β, we need to find a stationary transmission rate adaptation for all αdiscounted problem with α ∈ (0, 1), i.e., the Blackwell optimal policy.For the considered finite state MDP, the Blackwell optimal policy [34] exists and is also optimal for the average cost problem (16).The Bellman's equation of the above αdiscounted problem is then whose state transition is described by ( 6), (7), and (8).Using dynamic programming with value iteration [34]  Find smallest β that satisfies the average power constraint, δ is a small constant that controls the algorithm output accuracy β ← (β max + β min ) /2 ; Initialize V 0 α (q) for every system state in Q and n = 1; Solve for V 1 α from V 0 α via value iteration as (17); Find optimal µ for each β and Update V n α from V n−1 α via value iteration as (17); Compute the corresponding power P tmp ; if P tmp > P then β min = β; else β max = β; Denote the solved transmission rate function as µ (q t ) and the resulted latency as D .Optimal policy extraction: * = arg min ∈E, ≤ max D , µ * (q t ) = µ * (q t ), and D * = D * .can solve the α-discounted problem.Since the discounted cost V α is bounded, [34] shows that solving (17) generates the optimal transmission rate control µ * .
We summarize the above steps in Algorithm 1, which solves (15) to find the optimal target error rate and transmission rate adaptation.To provide insights on the structure of optimal transmission controls, we now resent a closed-form characterizations when M → ∞ in Section IV.

IV. LARGE-ARRAY LATENCY-OPTIMAL CONTROL
In this section, we derive the latency-optimal control for the single-user problem in (14) when the number of basestation antennas M → ∞.For the single-user system in Rayleigh fading, the per-antenna gain κ n in (11) satisfies the following [11, A.2.4], [12], [14].
• Mean: The per-antenna gain mean is a constant that is independent of M , i.e., • Variance: The per-antenna gain variance is inversely proportional to M , i.e., In Section V, we will show that a multiuser massive MIMO channel can be decoupled into parallel single-user channels.
For each of the decoupled channels, the per-antenna gain is also of variance that is inversely proportional to M .Based on condition (18), the achievable SINR grows with the number of base-station antennas M linearly.As the focus of the current section is on the asymptotic analysis with M → ∞, we can view log M as the link "capacity".In the same spirit, we define the system utilization factor to be a constant ρ ∈ [0, 1) as where λ is the packet arrival rate, L is the number of bits in each packet, and N is the number of subcarriers.By (20), the packet arrival rate λ increases with M and equals N log M Lρ .Conceptually, the term N log M can be viewed as the total "capacity" of the wideband link and λL can be viewed as the data load.Thus, the utilization factor ρ can be interpreted as the ratio between the offered data load and the total link "capacity".
We also make the following assumptions for mathematical tractability.We consider an infinite buffer (i.e., B → ∞), thus no buffer overflow or overflow latency occurs.And the target error rate can be chosen from a continuous set (0, 1).

A. Array-Latency Scaling Lower Bound
Notice that a trivial lower bound of D * (M ) is 1 frame, which is the first transmission attempt of a packet.This 1 frame latency lower bound can only be achieved if the target error rate is exactly zero.We now provide a tighter lower bound of the array-latency curve D * (M ).
Theorem 1 (Latency Scaling Lower Bound).The optimum array-latency curve D * (M ) satisfies where o is given by where F η (•) is the CDF of the effective channel gain η in (12), ρ ∈ [0, 1) is the utilization factor in (20), and τ is the number of pilots.
Proof.The main idea is to lower bound the overall latency by the packet retransmission latency, which monotonically increases with the target error rate.To complete the proof, we use Jensen's inequality to show that there exists a minimum target error rate o such that for any < o the long-term throughput is smaller than λ.Appendix A provides the proof details.
Theorem 1 presents a latency lower bound.For any transmission rate adaptation, o is the minimum target error rate that leads to a long-term throughput no smaller than λ.And if the target error rate is smaller than o , the queue-length process will not stable.By the definition of η (12), the per-antenna mean (18), and the per-antenna variance (19), Chebyshev's inequality can be used to show that o converges (in probability) to 0 as the number of base-station antenna M increases to infinity.The channel hardening effect can explain such convergence.The latency lower bound (21) hence converges to 0 as M → ∞.
If τ p τ is small, the channel estimation error is large.As a result, both o and the latency lower bound are large.In this case, neither high reliability nor low latency can be met.Hence, sufficiently good channel estimation is necessary for achieving high reliability and low latency.

B. Large-Array Optimal Target Error Rate and Transmission Rate Control
In this subsection, we present a simple transmission control policy that meets with the latency lower bound in (20) asymptotically as M → ∞.
Definition.We define the Large-arraY Reliability and Rate Control (LYRRC) as * = o µ * : r t (q t ) = min (q t , 2λ) , where o is given by (22).
The LYRRC policy contains two parts: a target error rate of o and an transmission rate control policy µ * .The transmission rate adaptation µ * describes a simple thresholding rule: If there are more than 2λ packets in the buffer queue, i.e., q ≥ 2λ, 2λ packets will be transmitted.If less than 2λ packets are currently in the buffer, all packet in the queue will be scheduled for transmission in the frame.In each frame, based on the transmission rate of min (q t , 2λ), the user utilizes power adaptation (13) to achieve the target error rate target o .
To evaluate LYRRC, we now first derive the latency with arbitrary target error rate < 1 2 and transmission rate policy µ * .We next prove the asymptotic optimality of LYRRC (23) by comparing the achieved latency to the minimum latency lower bound in Theorem 1.
Proof.The main idea is to compute the steady state distribution of the queue-length, which is a Markov chain with infinite countable states.Appendix B provides the complete proof.
Lemma 1 provides a closed-form characterization of the transmission rate adaptation µ * when the maximum bufferlength is infinite.To provide insights on the proof of Lemma 1, we consider the associated Markov chain of the buffer-length.The buffer-length state transition under any target error rate ∈ (0, 1), which is not necessarily equal to o , and the transmission rate adaptation µ * is depicted in Fig. 5.By Little's Law, the overall latency equals to the ratio between the average queue-length and the arrival rate λ.Notice that λ is the difference between the adjacent states in Fig. 5. Hence, the average queue-length is in proportional with λ (see Appendix B for a rigorous proof).As a result, the overall latency depends only on the target error rate , but not on λ.
To summarize, the transmission rate control policy µ * applies a negative drift −λ with probability (1 − 2 ) towards the minimum queue-length λ.To minimize the latency as M → ∞, the queue-length needs to be regulated towards the minimum queue-length λ.This regulation is achieved by selecting a smaller target error rate.
By using Lemma 1, we have that the achieved latency of LYRRC is D LYRRC (M ) = 1 + o 1−2 o .As mentioned above, the target error rate o of LYRRC (23) reduces as the number of base-station antennas increases.The achieved latency D LYRRC reduces with more base-station antennas.We now prove the asymptotic optimality of LYRRC.
Proof.We first characterize the gap between latency under LYRRC and minimum latency by combining Lemma 1 and Theorem 1.The proof is complete by using the large deviation theory to show that the power constraint is satisfied.Please see Appendix C for details.
Theorem 2 establishes the asymptotic optimality of LYRRC.In addition, the latency gap between the lower bound and LYRRC increases as the channel estimation error increases (τ reduces).Furthermore, Lemma 1 and Theorem 2 suggest that the latency-optimal target error rate increases for systems with fewer base-station antennas.Hence, the reliability and low-latency design objectives of 5G URLLC does not always matches with each other for practical massive MIMO system with finite M .Finally, we note that LYRRC can achieve optimal-latency for any ρ ∈ [0, 1), which seems to contradict the transmission rate of min (q t , 2λ).This can be explained by the fact that we are considering a wireless link with power adaptation and the probability of transmit at 2λ reduces as M → ∞.Therefore, using larger transmission power (over a few frames) can increase the peak transmission rate beyond the long-term average rate.We next combine Theorem 2 and Theorem 1 to characterize the scaling of the array-latency curve D * (M ) in closed-form.
Theorem 3 (Large-Array Latency Scaling).As M → ∞, for any positive τ and ρ ∈ [0, 1), the optimum latency converges to 1 frame as where F η (•) is the CDF function of the effective channel gain η, and o is given by (22).
Proof.Theorem 1 provides a latency lower bound.The optimal joint control in Theorem 2 serves as an achievability proof and provides an upper bound.The proof is complete by showing that the ratio of the upper bound and the lower bound converges to 1 as M → ∞.
Theorem 3 provides a closed-form characterization of the large-array latency.In closed-form, it describes the minimum latency D * as a function of the utilization factor ρ, the channel estimation error, and the number of base-station antennas M .As M → ∞, o → 0. Thus, both the retransmission and queueing latency converges to 0 frame.Finally, we comment on the impact of imperfect channel state information.For any τ > 0, the latency convergence to the 1 frame as M → ∞.For a practical system with finite M , more accurate channel leads to smaller latency.

V. MULTI-USER EXTENSION
In this section, we now consider the K-user latency minimization problem over the lossy channel.In this section, suffix The multiuser controller decides the target error rate [k] and the transmission rate r t [k] of User k.The buffer dynamic of each user is identical to that of the single user counterpart that is described in Section II-A2.
To minimize the system latency of the K users at the same time, we associate positive weights ω k , k = 1, . . ., K to users.The multiuser latency minimization problem is then where max [k] is the maximum allowed target error rate (minimum reliability) of User k.And SINR t,n [k] is the receiver SINR of the n-th subcarrier in Frame t for User k.
Here, the buffer length q t [k] and buffer overflow b t [k] of User k is given by ( 7) and ( 8), respectively.
To detect signals from the K users, the base-station applies receive beamforming.Let matrix H n ∈ C M ×K denotes the uplink small-scale channel fading between the M -antenna base-station and the K users.Throughout this section, we consider user channels follow i.i.d.Rayleigh fading.Finally, the base-station receives an inter-cell interference that is modeled by an additive white Gaussian noise of power p I , which is independent of the estimated channel.
Let the estimated channel and estimation error be Ĥn and Hn , respectively.With the MMSE estimator, the estimation n Ĥn −1 ĤH n .On Subcarrier n, the received signal of User k is [11], [12] where z and z I are the receiver noise and inter-cell interference, respectively.Similarly to past work [28], [29] on retransmission, we compute the SINR by treating the interference as the worst case Gaussian noise.And the effective SINR for User k on Subcarrier n is where [•] kk denotes the k-th diagonal element of a matrix.A crucial property of the SINR n term ( 28) is that the randomness of both the channel variation and the interference is concisely described by the inverse of the estimated channel, which is a random matrix.For a practical uplink system where each user is unaware of other users' channel or queue information, the joint target error rate and transmission rate adaptation design appears intractable.To see the difficulty of the joint policy design, we consider the following example.For each user, the inter-beam interference in (28) depends on other users' large-scale fading and transmission power.Recall that each user's transmission power changes in each frame based on its current queuelength.Thus, it is extremely difficult for each user with only local knowledge (queue-length and large-scale fading) to infer the exact value of and hence the proper transmission power.As a result, the target error rate and transmission rate policy cannot be designed distributedly by each user, which is undesirable for a practical uplink system.
Here, we proceed with the observation that, in real-world systems, the pilot power is usually required to be higher than the data signal power [5].Hence, the term is upper bounded by K τ , which can be viewed as a worst cast interference penalty.Each user then adjusts its power based on the SINR loss upper bound.Substituting the SINR expression (28) of the multiuser system into (6), we then have that the target error rate as where the per-antenna gain κ n is Similarly to the single-user case, we also compute the perframe transmission power as where is the scheduled reliability target (target error rate) and r is the transmission rate (in unit of packet).Here, ≈ in ( 29) is because that each user considers the upper bound of inter-beam interference.
The per-antenna gain ( 30) is independent of the large-scale channel, transmission power, and hence queue-length of the other K − 1 users.For each user, the distribution of the effective channel η in (12) then becomes independent of the channel, queue-length, and power of the other users.Therefore, we can decouple the multiuser problem.By adopting a new distribution of the effective channel gain η (generated by (30)) and the new power mapping (31), the multiuser problem is decoupled to K independent single user problems (14).Each of the single-user problems can be solved by Algorithm 1.We now further demonstrate that the large-array analytical results in Section IV also apply to the considered multiuser systems.Theorem 4. For multiuser uplink systems, LYRRC becomes As M → ∞, for positive τ [k] and ρ [k] ∈ [0, 1), each user operates under LYRRC achieves the minimum latency of Proof.With random matrix theory, we prove by adopting similar steps as in the single-user case.The key is step is to compute the mean and variance of (30).Please find the proof in Appendix D.
Recall that LYRRC, therefore, indeed provides the latency-optimal target error rate and transmission rate policies to the multiuser massive MIMO system.And Theorem 4 also captures the minimum latency of each user.
In conclusion, for any non-negative weights ω k , we can convert the K user optimization problem into K parallel single user problems.For finite M , Algorithm 1 solves each of the single user problems and provides the optimal target error rate and transmission rate policy.Furthermore, each user operates using LYRRC distributedly is asymptotically latency-optimal.
We end this section by discussing some possible extensions of the multiuser system analysis.
The first extension is the general multiuser MIMO systems with user correlation.For massive MIMO, the user channels are expected to become mutually orthogonal as M increases, which is usually referred to as "favorable propagation" [11], [12].The favorable proportion is expected to hold in massive MIMO systems [11], [12] and is verified by recent massive MIMO measurements [35], [36].However, for small scale multiuser systems, user channels might be significantly correlated, and the multiuser scheduling problem cannot be fully decoupled.While spatial multiplexing correlated user leads to smaller SINR, spatial multiplexing only non-correlated users can lead to longer queueing latency.Hence, we expect a latency-minimizing scheduler should balance a tradeoff between longer queueing time and smaller SINR.
The second extension is to model the pilot contamination and base-station array correlation, which both can reduce the SINR.The pilot contamination [11], [12] is caused by pilot reuse and leads to both non-coherent and coherent interference.In particular, without proper pilot decontamination, coherent interference can grow linearly with the number of base-station antennas.Recent research [12], [37] demonstrates that via multicell joint transmission, the massive MIMO system can reject the coherent interference if the covariance matrix of pilot sharing users is asymptotically linearly independent.Under the same condition, [12], [37] shows that the effective SINR can grow linearly with M without bound with pilot contamination and base-station array correlation.Therefore, it is reasonable to use a finite p I to model the power of the residual inter-cell interference after pilot decontamination.
Finally, we consider the latency-minimum transmission control of multicell systems with pilot contamination and basestation array correlation as an important future work.Note that [12], [37] shows that the SINR can also grow linearly with M , which implies that the mean of the per-antenna gain would be lower bounded by a positive constant.Computing the variance condition and finding the optimal transmission control for this generalized setup is beyond the scope of this paper.To evaluate the impact of the spatial correlation, we utilize over-the-air measured channels in Section VI.

VI. NUMERICAL RESULTS
In this section, we utilize measured channels and simulated channels to confirm our previous analysis in Section III and Section V.During the numerical evaluation, the latency duration is captured in the unit of second, which is obtained by multiplying frame duration to latency measured in the unit of frame.We measure the over-the-air channels between mobile clients and a 64-antenna massive MIMO base-station with Argos system [2] on the campus of Rice University.
Figure 6a and 6b describes the Argos array and the over-the-air measurement setup.We measured the 2.4 GHz Wi-Fi channel (20 MHz, 52 non-empty data subcarriers) for four pedestrian users in non-line-of-sight environments, which are denoted by Fig. 6c.For each user, we take channel measurements over 7900 frames of all subcarriers.The effective measured SNR between each mobile user and each base-station antenna is higher than 15 dB.In simulations, we consider measured overthe-air channel traces as the perfect channel.
The base-station adopts MMSE estimator to estimate τ uplink pilots, each of power 20 dBm, from the users.Using the estimated channel, the base-station generates zero-forcing receive beamformers to decode the signal of each user.The users are assumed to follow average power constraint of 20 dBm with large-scale fading of −10 dB.The maximum buffer length B is 10.The packet arrival rate is uniform over the time at the rate of 5 packets per frame.And the packet size L is 52 bits per OFDM symbol.The latency penalty of dropped packets from buffer overflow is 0.5 s.And each self-contained frame is considered of duration 0.25 ms.The state space of the target error rate is [1%, 2%, . . ., 20%], [0.1%, 0.2%, . . ., 0.9%], and [0.01%, 0.02%, . . ., 0.09%].Each user is under a maximum target error rate constraint of 3.16%, which is equivalent to the 5G URLLC reliability constraint of 99.9999% (over 1 ms).And the power of the inter-cell interference equals the receiver noise floor.
Fig. 7 provides the latency performance comparison of four different policies over the measured channels and simulated i.i.d.Rayleigh fading channels.The blue lines are the optimal array-latency curves under the proposed joint reliability and transmission rate adaptation, which is obtained by Algorithm 1.The red lines are the proposed low-complexity LYRRC (23), which was discussed in Section IV.The green colored lines capture the latency under optimal transmission rate adaptation but fixed reliability (target error rate of 10%).And the black lines are the latencies of fixed reliability (10% target error rate) and transmission rate adaptation under a peak power constraint, which is currently deployed in LTE and Wi-Fi systems.
Over measured and simulated channels, the proposed joint control (blue and red lines) clearly provides better latency performance than the two fixed-reliability counterparts.Allowing target error rate to be adaptive on the number of antennas M turned out to reduce the latency significantly.Compared to the fixed target error rate with peak power control, a 400× latency reduction is observed when M > 30.Additionally, when M is larger than 30, we find that the proposed joint control can provide a 20× latency reduction compared to the state-of-the-art control that fixes target error rate and adapts transmission rate [20]- [23] (based on the number of antennas and queue length).The large-array asymptotic latency-optimal control, LYRRC, turned out to be near latency-optimal when M is larger than 30.Finally, we find policies that fixed target error rate at 10% leads to at least 5 ms latency and cannot satisfy the URLLC latency requirement.Fig. 7 captures the influence of imperfect channel state information on latency.For a multiuser uplink system, the interbeam interference (30) reduces with the number of pilots τ .And achieving the same target error rate becomes more power expensive with larger inter-beam interference.Therefore, over measured and simulated channels, the latency increases as τ reduces.Fig. 7 also demonstrates that the spatial correlation of the base-station antennas reduces the minimum achievable latency.With the same number of pilots τ , a lower latency is observed in i.i.d.Rayleigh fading channels than that in measured channels.The increased latency can be explained by the reduced system capacity from spatial correlation [11], [12].We further remark that LYRRC achieves near optimal latency performance over both measured and simulated channels when M > 36.
We now comment on the optimal target error rate that minimizes the latency.Fig. 8a describes the latency-optimal target error rate obtained during solving the latency minimization problems in Fig. 7.The latency-optimal target error rate increases as τ reduces due to less accurate channel estimation, which agrees with LYRRC.Additionally, due to the reliability constraint, the solved latency-optimal target error rates satisfy the 5G reliability requirement (target error rate of 3.16%).
Finally, we use simulations to verify our structural analysis in Section IV.Fig. 7 confirms that LYRRC ( 23) is near latencyoptimal for M larger than a finite number of 38.One technical contribution independent of the massive MIMO system is a simple transmission rate adaptation µ l as min (q, 2λ), which is referred to as "rule of double" and is part of LYRRC.Lemma 1 captures that, when buffer size B → ∞, the resulted latency by using µ l and a target error rate < 0.5 is 1 + 1−2 .Fig. 8b shows the resulted latency by using µ l with a finite buffer size.The (large-buffer) asymptotic latency turned out to accurately approximate the system latency when B is larger than 30.And as the target reliability increases (target error rate reduces), buffer overflow is less likely to happen and the latency approximation in Lemma 1 becomes increasingly accurate.

VII. CONCLUSION
In this work, we study the latency-optimal cross-layer control over wideband massive MIMO channels.By identifying a tradeoff between queueing and retransmission latency, we find that a lower physical layer target error rate does not always guarantee lower latency.We present algorithms that generate the optimal target error rate and transmission rate policies.We show that to achieve the minimum latency, the target error rate can no longer be considered fixed and needs to be adapted based on the number of base-station antennas, channel estimation accuracy, and the traffic arrival rate.Our results also demonstrate that massive MIMO systems have the potential to achieve both high reliability and low latency and are a promising candidates of 5G URLLC.

APPENDIX A PROOF OF THEOREM 1
We use a per packet argument.Since infinite buffer is assumed in this section, no packet is dropped and all packets will be successfully received with a variable number of transmissions due to the potential channel-induced error.For any target error rate , let r be the average number of retransmissions.The sum of the retransmission latency and transmission time equals which is a lower bound of the total latency because the queueing latency is ignored.To finish the proof, we now lower bound under the long-term power constraint P .Under the steady state, the average transmission rate equals to the packet arrival rate, i.e., The power function ( 13) is convex on r.We can apply Jensen's inequality and ( 20) to obtain a lower bound for the average transmission power as Function F −1 η is an inverse CDF and is non-decreasing.From (36), the is lower bounded as Using the monotonicity of the CDF, a lower bound on the target error rate is then We finish the proof by combining (37) and (34).

APPENDIX B PROOF OF LEMMA 1
We compute the queueing latency by considering the steady state.Under transmission rate adaptation µ l , the buffer length process ( 7) is rewritten as q t+1 = max [q t + (1 − 2 1 t ) λ, λ] .The buffer length process under µ l thus constitutes a Markov chain with countably infinite states [39].The distribution of 1 t is determined by target error rate as Prob (1 t = 1) = and Prob (1 t = 0) = 1 − .The state transition is shown in Fig. 5. Denote the steady state distribution of the buffer length as π q .We then have that where N i=0 π iλ = 1.The steady state distribution is then computed as  Using (38), the average latency is then computed as which completes the proof.

APPENDIX C PROOF OF THEOREM 2
We characterize the gap between latency under LYRRC as where the last step is obtained via applying Theorem 1 and (37).Equ.(39) provides the characterization of the latency gap.To finish the proof, it is sufficient to show that the average power constraint P is satisfied under the large-array simple control.
With utilization factor ρ (20), the packet arrival rate scales as λ = (ρN log M ) /L.Using the per-frame power (13) and the definition of o (23), the transmission power with rate r is We want to show that the power constraint is satisfied, i.e., P o ,µ l ≤ P .Using (40), the second power consumption term of (41) is upper bounded as where the last step is by the definition of CDF.We now upperbound (44) as the follows.
Here, the last term denotes the probability that κ has a larger deviation (to its mean) than E [κ] − exp (N x).Using Chebyshev's Inequality, a new upper-bound is obtained as where the last step is by conditions (18) and (19).By the definition of o , using the above upper bound proves (43) and completes the proof.

Fig. 2 :Fig. 3 :
Fig. 2: Single-user uplink system consisting of a single antenna user and an M -antenna base-station.

Fig. 5 :
Fig.5: Evolution of the queue-length qt under any target error rate ∈ (0, 1) and the transmission rate adaptation µ * as a Markov chain.If > 0.5, the average queue-length hence queueing latency is infinite.

Fig. 6 :
Fig.6: Argos[2] Massive MIMO base-station and the over-the-air measurements setup.The background map of Fig.6cis generated by Google Maps[38].The black single antennas denotes the locations of the mobile users.

Fig. 8 :
Fig.8: Fig.8ashows the computed error rate that provides minimum latency in the measured channels.And the resulted minimum latencies are shown in Fig.7(in blue).Fig.8bverifies the latency characterization under "rule of double" in Lemma 1.

o 1
− o P o (2λ) ≤ o P o (2λ) ≤ o M ρ .(42)Therefore, the sufficient condition (41) is equivalent tolim M →∞ o exp (ρ log M ) = lim M →∞ o M ρ = 0.(43)Before proving (43), we first present an upper bound of o .The effective channel gain η (12) is the average of N i.i.d.random variables log κ.For x < 0, we thus have an upper bound asF η (x) = F N n=1 log κn (N x) ≤ F log κ (N x) = P r (κ ≤ exp (N x)) , Latency-Optimal Joint Target Error Rate and Transmission Rate Control Input : Average power constraint P , number of antennas M , number of subcarriers N , distribution of packet arrival a, large-scale channel gain γ, CDF of effective channel gain η, number of pilots τ , pilots power p τ .Output: Optimal target error rate * , optimal transmission rate adaptation µ * , minimum achievable latency D * .for ∈ E that ≤ max do Find minimum latency for each ∈ E β min = 0, β max = z; z is a very large but finite number while β min /β max < 1 − δ do (17)(17), we Algorithm 1: Latency-optimal Target Error Rate and Number of Base-Station Antennas M (b) Latency under µ l (23) with finite buffer length.