URLLC Mode Optimal Resource Allocation to Support HARQ in 5G Wireless Networks

Ultra Reliable Low Latency Communication (URLLC) mode is a promising 5G technology for various real-time applications, such as, autonomous vehicles, augmented reality, and factory automation. Several papers on URLLC exist, however, most papers only focus on ways to achieve high reliability, and lack mathematical analysis of delay related to packet errors, retransmissions, and bandwidth. In this paper, a M/G/1 queuing model is applied to analyze the erroneous transmission recovery delay of URLLC multi-user services, and using this model the minimum required bandwidth is derived and applied into an adaptive control scheme. The proposed Pollaczek-Khinchine (P-K) formula based quadratic optimization (PFQO) scheme optimizes the bandwidth requirement by controlling the maximum retransmission parameter of the hybrid automatic repeated request (HARQ) mechanism in URLLC. Simulation results show the bandwidth saving effect of the proposed PFQO scheme based on various signal to interference plus noise ratios (SINRs) and packet length distributions.


I. INTRODUCTION
Ultra Reliable Low Latency Communication (URLLC) 5G technology was designed to support packet delivery with minimum latency and high reliability, which is required for realtime automation applications that must satisfy high levels of quality of service (QoS) to be precise and maintain stability. This makes URLLC appropriate for autonomous system services, such as, drone control, autonomous vehicle platooning, and factory automation. In URLLC systems, the end to end (E2E) delay should be less than 1 ms and the packet error rate (PER) needs to be less than 10 −5 [1]. In order to meet the requirements, many technologies are suggested. The 5G New Radio (NR) is designed to use flexible numerology (e.g., sub-carrier spacing and mini-slots) to allocate resources efficiently for various services like URLLC, Enhanced Mobile Broadband (eMBB), and Massive Machine Type Communications (mMTC). In 5G, resources can be allocated at the mini-slots, which are much smaller units compared to the slots allocated in LTE [2]. In addition, feedback signals can The associate editor coordinating the review of this manuscript and approving it for publication was Faisal Tariq . be sent within the same slot using self-contained slots in NR to reduce the delay. Similar to Long-Term Evolution (LTE), hybrid automatic repeated request (HARQ) can be applied in NR for reliability improvement. Many papers on URLLC have proposed schemes to improve the reliability and latency simultaneously using techniques like early HARQ feedback prediction with machine learning [3] and blind retransmission HARQ [4], where results show that the feedback-delay waiting for an ACK/NACK can become too large to satisfy the low-latency constraints. Therefore, papers propose schemes that send retransmission messages in advance using channel state information prediction to reduce the latency in HARQ. In order to satisfy the QoS requirements, URLLC packets have higher priority over other applications, which are implemented using schemes like puncturing eMBB transmission [5]. The studies of [6] and [7] propose schemes where URLLC transmissions puncture eMBB transmissions in a way that has little impact to eMBB transmissions. In [6], Conditional Value at Risk (CVaR) is used to measure the risk of eMBB transmissions, and Markov inequality is used to maintain the URLLC reliability. Research in [7] and [8] propose a scheme to decompose the resource allocation problem into two sub-optimal problems of URLLC and eMBB transmission and solve the sub-problems. Applying network slice technology for URLLC applications makes maintaining URLLC and eMBB QoS more effective. In [9], by controlling the modulation coding scheme (MCS) and retransmission resource parameters, the authors minimize the total resource consumption based on reliability constraints. The authors of [10] divide the use of one-shot transmission and the HARQ scheme considering the reliability, block error rate (BLER), and required channel use. In [11], a scheme that controls the maximum number of transmissions and minimizes the bandwidth (based on a finite block length assumption of the system operation) using the square root staffing rule is proposed. However, these papers do not perform an URLLC protocol based queuing delay analysis or consider the scenario where the URLLC packets can be transmitted on demand. These factors have to be considered when there are multiple URLLC users, such as in control of drone swarms or autonomous vehicle platoons. This paper deals with the mathematical analysis of reliability, latency, and allocated bandwidth controlled by a 5G next generation Node B base station (gNB) in order to satisfy URLLC requirements. The packets in gNB waiting time and the service time are derived based on the Pollaczek-Khinchine (P-K) formula [12]. The optimization on the required bandwidth for stable URLLC operation is derived, which is controlled through adaptation of the maximum number of transmissions. The contributions of this paper can be summarized as follows: • To minimize the HARQ URLLC bandwidth requirement, a P-K formula based quadratic optimization (PFQO) scheme is proposed.
• The optimal bandwidth requirement to support URLLC HARQ is derived using queuing delay vacation and gating effect modeling considering multiple URLLC users and the 5G packet structure.
• To enhance the accuracy, compared to other papers, the formulation is based on the HARQ end-to-end delay, which includes the queuing delay and transmission delay.
• In order to deal with the HARQ packet length influence, uniform and beta distributions are considered in the 5G gNB antenna resource allocation M/G/1 model.
• Based on use of network slice technology, derivations to determine how much bandwidth is required to allocate an URLLC slice, considering NR specifications and queuing delay are provided in this paper, which could be implemented into future 5G schedulers to enhance the URLLC performance.
• The performance of PFQO is compared to the bandwidth square root staffing rule based optimization (SSRO) scheme of [11], which uses the square root staffing rule to optimize the URLLC bandwidth.

II. SYSTEM MODEL AND PROBLEM FORMULATION
In the analysis, 5G gNB base stations use the 3 GHz and mmWave frequency bands (based on the Release 16 3GPP standards on URLLC [1]) to transmit the packets to multiusers and resources are allocated through the slot based NR numerology system. The gNB transmits λ finite block length packets of L i bits to the i th user equipment (UE) (among M users) denoted as U i (i = 1, 2, . . . , M ). In order to transmit a packet of L i information bits, the channel use factor r i is required to access a finite block size transmission of where, C(ψ) = log 2 (1 + ψ) is the channel capacity based on an infinite block length, V (ψ) = (log 2 (e)) 2 (1 − 1 (1+ψ) 2 ) is the channel dispersion, ψ i is signal to interference plus noise ratio (SINR) for each user, p is the decoding failure probability, and Q −1 (x) is the inverse Q function of x [11]. It is assumed that there are discrete C classes of users, where packets of each class have (nearly) the same SINR [11]. As the gNB analyzes the Channel State Information (CSI), each packet of the classes are stacked and serviced based on queueing order transmissions (corresponding to the NR numerology), as shown in Fig. 1.

A. NR NUMEROLOGY
The sub-carrier spacing and time slot duration can be adjusted with flexible numerology in 5G systems. In the system model, the channel use factor r i = k i s i h i is required to send a packet, where, k i is the number of channel use per unit bandwidth and per unit time, s i is the period of the transmission time,  and h i is the allocated bandwidth for transmission [11]. At the gNB, a bandwidth of W = E[k i h i ] is required to transmit the packets within the time period. The data rate can be obtained from ηW , where η is the spectral efficiency in units of bits/s/Hz, in which η is from 3.24 ∼ 11.25 bits/s/Hz in general for mmWave communications [13]. The URLLC system may adaptively reduce the spectral efficiency η to lower the error rate and maintain the wireless link reliability.
In a normal cyclic prefix (CP) slot, 14 symbols are allocated per slot. Compared to LTE (which allocates radio resources in blocks of 14 OFDM symbols, i.e., 1 TTI/subframe), 5G NR more efficiently allocates l (e.g., 2, 4, 7) symbol units (called a mini-slot) per slot in the downlink [14], which is shown in Table 1. While the frame duration and subframe durations are fixed regardless of the Sub-Carrier Spacing (SCS), the symbol duration becomes shorter as the SCS increases. Correspondingly, the duration to transmit a packet is represented as s i = l 14 (2 −m ) ms and the subcarrier spacing is f = 15(2 m ) kHz, where m = 0, 1, 2, 3, 4 [15].

B. QUEUING MODEL
It is assumed that, at a gNB, the packets to be sent follow a Poisson distribution, and because each user uses a different application with a different packet length, the service time needs to be modeled using a general distribution [11]. Since packets are transmitted one by one, the system is modeled using a M/G/1 queue model like in [16]. M/G/1 is a queuing model that applies a packet arrival rate pattern that follows a Poisson distribution, service rate follows a general distribution, and packets are serviced one by one. In Fig. 2, when a packet is generated, the packet is placed in the queue of the gNB. The queuing delay and the transmission delay occur at the gNB. If the receiver fails to decode the packet, it returns a NACK to request a retransmission up to a maximum of φ times, and feedback delay occurs on each round. The total delay should be less than the delay constraint, and can be expressed as (2), which does not include the time spent in the retransmissions.
In (2), T prop is the propagation delay and T proc is the packet processing time, which can be modeled as a deterministic value. When the uplink resources are sufficient, the feedbackdelay can also be considered a constant value [11].
Since the gNB is maintained as a steady stable system, the waiting time and the arrival rate can be considered as average values. Furthermore, the number of packets in the queue can be calculated based on Little's theory [12]. At the gNB, λ packets/ms are generated and sent to M users. In the HARQ scheme, packet decoding can fail at the receiver, which leads to a retransmission. The number of transmission attempts is limited by the maximum number of retransmissions φ, based on the delay constraints. There are various HARQ schemes, such as, Incremental Redundancy HARQ (IR-HARQ) and Chase Combining HARQ (CC-HARQ) that can be considered.
In this paper, the chase-combining model is analyzed based on the assumption that the gNB retransmits the same packet if erroneously received. The total average downlink arrival rate at the gNB is where φ is the number of maximum retransmission attempts and p k is the VOLUME 8, 2020 Packet Error Rate (PER) at the k th transmission (p φ < . . . < p 2 < p 1 ). Based on the block-fading channel model, p k is represented [17] as where g is the parameter derived according to the Modulation and Coding Scheme (MCS) and ψ M is the maximum SINR when the decoding failure probability is 1, which can be calculated depending on the MCS. In the CC-HARQ scheme, the receiver combines the first k − 1 received packets and uses them to decode the k th packet using Maximum Ratio According to the PASTA property [12], the average downlink arrival rate in each retransmission round is the same as λ down . The service rate is µ, average service time isX = 1 µ , service time second momentum isX 2 , reservation interval isV , and the variation of vacations is σ v 2 . Since URLLC packets can be transmitted in the reservation interval and the data interval of other packets, to achieve low latency, vacations need to be considered based on an exhaustive system model. The reservation scheduling control (e.g., gated system) of the URLLC system was modeled as an exhaustive system because a transmission packet that arrives during the users reservation interval can be transmitted during that same interval.

III. QUEUING MODEL BASED DELAY ANALYSIS
The system load is expressed as ρ = λ down µ = λ downX and ρ < 1 is required to maintain a stable queuing system. In the M/G/1 queue model, the P-K formula is applied to derive the waiting time at the gNB, and T queue applies to the exhaustive system model [12], [18].

A. QUEUING DELAY AND SERVICE TIME
According to the P-K formula in the M/G/1 system [18], the queuing delay can be expressed as where λ down X 2 2(1−ρ) in (4) In order to derive the expectation of r and r 2 , an approximation using the expectation of L and √ L can be applied using (1).
The distribution of the packet length (L i bits) can be obtained from traffic monitoring, or approximated using a packet length distribution function, thus i ] can be obtained. Although, the transmission delay as well as the queuing delay to analyze the overall delay, which can be simplified as T trans =X , needs to be derived.

B. VACATION
In order to model the multi-user system's queuing delay, the time durations in which a user cannot access the channel due to other users transmitting, is modeled using vacation intervals. The vacation pattern of the downlink system can be modeled based on the slot configuration, which is determined when allocating downlink and uplink symbols. When τ symbols of the uplink mini-slots are set in slot format, the vacation of the system can be represented asV = τ 14 ms (τ = 0, 1, 2, . . . , 14), where is the probability that the uplink request is idle. The slot format supported in the 5G system is defined in the 3GPP standardization and can be assumed to follow fixed slot formats. Therefore, the uplink request can be assumed to be periodic, where the constant vacation intervals lead to σ 2 v = 0.

C. DELAY BOUNDS BASED ON THE URLLC REQUIREMENTS
To satisfy the URLLC latencyandreliability, where φ is the number of maximum retransmission attempts, p k is the decoding failure probability at the k th transmission, T d is the delay constraint, and δ is the PER constraint. Latency and reliability constraints are represented in (7a) and (7b). The total delay (including the transmission time, queuing delay, and feedback delay, multiplied by (φ+1) in (7c)) needs to be less than the delay constraint 1ms. Since T prop and T proc are deterministic values, the feedback delay T f is also constant, and therefore, the inequality of (8) is obtained.

IV. MINIMUM BANDWIDTH REQUIREMENT OPTIMIZATION
In order to derive the minimum bandwidth requirement of W based on the maximum retransmission attempt φ and system constraints, the optimization problem is formulated as (9).
From (9e), the bandwidth lower bound becomes λ down E [r i ] < W to maintain stability. Therefore, based on f = 15(2 m ) kHz, the lower bound for a stable queuing system can be expressed as in (10).
In order to simplify the formulation, a mapping of is applied to the optimization function based on the determination equation D = B 2 − 4AC, where M is the number of users,V is the average of the vacation, and λ down is the arrival rate of packets at the gNB.
Lemma 1: If A > 0, there is no W * that satisfies the URLLC requirement.
Proof: In order to derive the optimum bandwidth W , two conditions should be considered. At first, according to (9b), T queue + T trans needs to be less than T target to satisfy the latency constraint. The other condition is W * should be larger than λ down E [r i ] to maintain a stable queuing system. In order to consider the first condition, T queue + T trans is expressed in terms ofX , X 2 , andV . This can be calculated in terms of E[r i ], E[r i 2 ], and W based on (5) and (6).
Considering the stable system condition of ρ < 1 based on (9e), W 2 − λ down E [r i ] W > 0. As W increases, the service time and queuing delay decreases and lim W →∞ T queue + T trans = MV 2 , therefore, the lower bound of T queue + T trans is MV 2 . If MV > 2T target (i.e., A > 0), the lower bound of T queue + T trans is larger than T target . Therefore, there is no W * that can satisfy the URLLC reqruirement in this case.
, thenW * = W m . Proof: Similar to the proof of Lemma 1, (12) is used to optimize the bandwidth. The quadratic relation is expressed in (13).
The relation of (13) can be expressed in a simplified AW 2 + BW + C ≤ 0 form in terms of W . Since any W value can satisfy the requirement when A < 0 & D ≤ 0, W * only needs to be above W m . Therefore, W * = W m . If A < 0 & D > 0, the optimization needs to be analyzed separately according to W m and the smaller solution of the quadratic equation. If W m is larger than the smaller solution of the quadratic equation, then since W * is determined by the larger solution of the quadratic equation and W m , . Else, W * can be set in the range to be less than the smaller solution of the quadratic equation. Since W * only needs to be above W m , W * = W m .
The derived optimal bandwidth W * is allcoated to the queue of each class. In reality, since the bandwidth isn't infinite, W * should be less than the bandwidth upper limit. Based on [19], a resource grid is used for an antenna port and the number of resource blocks is fixed, meaning that the maximum bandwidth is fixed at the classes as Table 2. As the required bandwidth is evaluated in section V, the optimal bandwidth is less than the maximum bandwidth indicated TABLE 2. The number of maximum resource blocks and maximum bandwidth at a resource grid. VOLUME 8, 2020 in Table 2. Furthermore, even if the channel state is worse or the packet length is longer, a larger bandwidth can be supported by Carrier Aggregation (CA). According to [20] and [21], both FR1 (from 410 MHz to 7,125 MHz) and FR2 (from 24.25 GHz to 52.6 GHz) supports CA up to 16 carriers. Since the upper bound is larger than the required bandwidth in performance evaluation, the system can be maintained stably. T total = (φ + 1)(T trans + T queue + T f ) 6: if T total > 1 ms then 7: break 8: else 9: compute A, B, C based on (11) 10: if A > 0 then 11: no W * solution exists 12: else 13: compute W * based on Lemma 2 14 The optimization problem of (9) can be solved by Algorithm 1. After computing the decoding failure probability and the parameters including A, B, and C, the optimal bandwidth based on Lemma 1 and 2 is derived according to the condition that A is positive. This procedure is repeated while the total delay is less than 1 ms, as φ is increased by 1. Since the complexity of (11) can be calculated in terms of the number of users M , and the equations consists of simple additions and multiplications, the complexity of computing (11) is O(M ). After calculating A, B, and C, the problem can be solved using the quadratic formula and using the comparison in Lemmas 1 and 2, which has linear complexity. This problem should be solved at each transmission attempt. Although the maximum retransmission attempt φ affects the complexity to compute the optimal solution, the number of permitted transmission attempts will be few, due to the URLLC delay constraints. Therefore, the complexity of this optimization problem is O(M ), which shows linear complexity characteristics.

V. PERFORMANCE ANALYSIS
In this section, the bandwidth requirement to satisfy the required QoS performance of the proposed PFQO scheme is compared to the SSRO scheme of [11] in terms of packet arrival rate, packet length, and SINR. Because the PFQO scheme uses the derived optimum bandwidth formula, accurate adaptation of the required bandwidth based on the maximum number of transmissions (limit) is applied in the operations, where this effectiveness is demonstrated in the figures below. In the evaluation, the decoding failure probability is assumed to be the same in each transmission round, namely, repetition coding and the homogenous transmissions model of [11] is applied to compare the performance with the SSRO scheme (i.e., p 1 = p 2 = . . . = p φ+1 ).
The performance of PFQO and SSRO are simulated using MATLAB (R2019a), where the maximum number of transmissions is controlled to minimize the bandwidth requirement based on the number of users M = 10, T f ≈ 0.125 ms, delay constraint T d = 1 ms, PER constraint δ = 10 −6 , and the number of channel use per unit time and per unit bandwidth k = 1. The number of maximum transmissions, including both the original transmission and all retransmissions, is set from 1 to 7 in consideration of the one-shot transmission and the delay constraint. The parameter values used in the simulation are summarized in Table 3.
The analysis in Fig. 3 was conducted while ignoring the vacations in the uplink, where the packet arrival rate range is 10 ∼ 100 packets/ms, SINR = 20 dB, and the packet length L i has an Uniform distribution of [150,250] bits. If the number of maximum transmissions increases, the allowed decoding failure probability of each round increases which leads to a need to reduce the redundancy and channel use based on the fixed maximum allowed URLLC latency. However, since too many retransmission attempts require more expected bandwidth for each attempt, it shows convex features in the results.
URLLC has a recommended packet length limit of 32 bytes for evaluation purposes, and within this range, each application has an unique packet length distribution. Based on this consideration, Fig. 4 compares the performance results when the packet length has an Uniform or Beta distribution (based on the parameters α and β), number of uplink symbols in a slot τ = 2 symbols, idle probability = 0.01, packet arrival rate λ = 100 packets/ms, and SINR = 10 dB. Since URLLC packets have a maximum and minimum packet length, the Beta distribution is more suitable to use in  approximating the packet length distribution compared to the Normal distribution, which has a long tail [22].
The analysis in Fig. 5 is based on the number of uplink symbols in a slot τ = 2 symbols, idle probability is = 0.01, packet arrival rate of 100 packets/ms, and the packet length L i has an Uniform distribution of [100,200] bits. Fig. 3 shows that PFQO requires less bandwidth than SSRO while satisfying the URLLC requirements, where the performance gain of PFQO increases as the packet arrival rate increases. This is due to the difference where SSRO blocks and drops a packet if there isn't sufficient bandwidth to support transmissions immediately, while PFQO places the packet in the transmission queue of the gNB, which has a bandwidth saving effect. Furthermore, the PFQO optimized formula enables accurate instantaneous adaptation based on the current status of the URLLC system. Fig. 4 shows the trend of bandwidth requirement increment as the packet size increases based on the Uniform and Beta distributions. Changing α and β of the Beta distribution changes the mean and variance of the channel usage, which affects the bandwidth requirements. This is because when α is larger than β, the probability of larger packets being generated increases, which influences the bandwidth requirement.   5 shows how the bandwidth requirement decreases as the SINR increases, which is due to the smaller number of transmissions required to meet the URLLC reliability level.

VI. CONCLUSION
In this paper, the proposed PFQO scheme optimizes the bandwidth considering HARQ based on 5G URLLC communication. In comparison with other papers, the proposed scheme uses the queuing delay model based on the P-K formula while including the influence of the transmission time. The analysis of this paper formulates the relation of the allocated bandwidth and total delay, and proposes a more accurate delay model of the URLLC mode. The simulation results show that the bandwidth requirement when using PFQO is less compared to when using the SSRO scheme. For example, at the 100 packets/ms rate, the simulation experiments show that the proposed PFQO scheme results in a 32.3 ∼ 62.9% smaller required bandwidth for the range of 1 ∼ 7 maximum number of transmissions, when compared to the SSRO scheme. In addition, even at a much lower rate of 10 packets/ms, the proposed PFQO scheme results in a 34.1 ∼ 64.8% smaller required bandwidth for the same transmission attempts range of interest, when compared to the SSRO scheme. For future research, multi-server systems can be investigated by applying a M/G/m model to the URLLC system performance analysis. Such modeling is expected to enable higher levels of performance analysis for 5G URLLC systems. In addition, detailed HARQ feedback delay like applying blind-retransmission or ACK timing in Rel-16 (e.g., K1 parameter) will be studied.