Scheduling of Heterogeneous Services by Resolving Conflicts

Fifth generation (5G) new radio introduced flexible numerology to accommodate heterogeneous services. However, optimizing the scheduling of heterogeneous services with differing delay and throughput requirements over 5G new radio is a challenging task. In this paper, we investigate near optimal, low complexity scheduling of radio resources for ultra-reliable low-latency communications (URLLC) when coexisting with enhanced mobile broadband (eMBB) services. We demonstrate that maximizing the sum throughput of eMBB services while servicing URLLC users over a fixed length time-frequency grid, is, in the long-term, equivalent to minimizing the number of URLLC placements in the time-frequency grid; this is the consequence of reducing the number of infeasible placements for eMBB, to which we refer to as “conflicts.” To meet this new objective, we evaluate the performance of new, conflict-aware heuristics, consisting of a family of “greedy” and a lightweight heuristic inspired by bin packing optimization, all of near optimal performance. Moreover, having shed light on the impact of conflict in layer-2 scheduling, we investigate non-orthogonal multiple access (NOMA) as a potential approach for conflict resolution leveraging superposition coding. The superior performance of NOMA with respect to OMA, thanks to resolving conflicts, is showcased by extensive numerical results.


I. INTRODUCTION
The International telecommunication union (ITU) has defined new requirements and capabilities on 5G mobile communication systems to support a wide variety of new devices and services with diverse quality of service (QoS) requirements and characteristics [3]. The 3rd generation partnership project (3GPP) standardized 5G in the form of a novel radio interface technology, referred to as new radio (NR) [4]. 5G NR introduced flexible numerology and frame structure to accommodate heterogeneous service requirements, by supporting various values of subcarrier spacing and symbol / frame duration. Optimizing resource allocation in the NR numerology setting to deliver heterogeneous QoS requirements remains a challenging task [5]- [9].
In 5G and beyond, ultra-reliable low-latency communication (URLLC) services with extreme delay constraints will The associate editor coordinating the review of this manuscript and approving it for publication was Prakasam Periasamy . coexist with enhanced mobile broadband (eMBB) [10], that require very high bit rates (Gigabits per second) and have moderate latency (a few milliseconds) requirements [11]. Moreover, at present, URLLC services are expected to have lower traffic volumes than eMBB services [12], but this will not hold in the future for applications such as virtual reality and haptics. In this framework, the design of radio resource allocation strategies for URLLC traffic, when coexisting with eMBB, has been a focal point of recent research efforts [13]- [16].
In this direction, two approaches have been adopted by the 3GPP. The first is based on a ''puncturing'' framework: according to this, eMBB traffic is scheduled initially at the beginning of the slots; upon arrival of URLLC traffic, the latter is being prioritized and dynamically overlapped at mini-slots of ongoing eMBB transmissions (which are punctured, i.e., dropped). In the second approach, known as preemptive scheduling, resources are preemptively reserved for URLLC, before the demands are placed [1], [17], [18].
Based on puncturing scheduling, the studies in [8], [19]- [22] considered resource allocation strategies for the coexistence of URLLC and eMBB. The authors in [8] consider three types of models -threshold, linear and convex -to describe the eMBB data rate loss associated with the incoming URLLC traffic. Furthermore the authors in [20] propose a punctured scheduling approach for transmission of low latency communication traffic multiplexed on a shared channel with eMBB. Another approach is proposed in [21], where a risk-sensitive model was introduced in order to ensure URLLC allocation but also to minimize losses for eMBB users. However, these strategies can result in significant losses in terms of data rates for eMBB services and may also impact eMBB transmission reliability [23]. Targeting the above problem, the authors of [22] proposed a scheduling approach to maximize the minimum expected achieved rates and fairness, accounting for the expected values of the traffic. Alternatively, the authors in [1] studied the resource allocation of eMBB and URLLC services by preemptively reserving resources for URLLC. Such solutions ensure advantageous conditions for URLLC packets when they are generated, at the cost of wasting resources in absence of URLLC transmissions [9].
A flexible numerology and frame structure was explicitly considered in [1] by defining a time-frequency resource grid, containing several types of resource blocks of different shapes, expanding over different time spans and frequency ranges. Exploiting this flexibility to optimize the resource allocation to different services while ensuring their QoS requirements, was shown to be an NP-hard problem. The resource allocation optimization over flexible numerology and frame structure while avoiding the assignment of overlapping blocks that will cause collision (i.e., puncturing), still remains a challenging task.
In this paper, following the works in [1] and [8], we consider a 2-dimensional grid of resource blocks with different sizes in the time and frequency domains. The problem of identifying the resource allocation that maximizes the eMBB sum-rate is studied under the constraint of serving all URLLC throughput demands under different latency constraints ranging from 0.5 to 2 milliseconds (msec). In our previous work [24] we introduced a bin packing approach that minimizes the placements of the resource blocks for the URLLC services in order to minimize the infeasible placements for the eMBB services. In this work, additionally, we extend the above framework by proposing a second approach based on low complexity algorithms, that manage the infeasible placements. We also extend our previous bin packing approach incorporating a preprocessing step that counts the overall feasibility of the grid. We note that in our formulation, unlike in [22] and other published work, we do not account for the expected traffic but rather for the actual traffic in an appropriate time-frequency grid. This approach has also been followed in [1] among others.
Furthermore, we consider and formulate an alternative non-orthogonal multiple access (NOMA) scheduling proposal. In particular, we introduce a 2-user NOMA scheme based on pairing eMBB and URRL services at the minislot level and show that the proposed approach achieves higher eMBB sum rates when compared to the optimal orthogonal scheduling as it avoids puncturing and preemptive allocation. 1 Finally, we evaluate the performance of the proposed algorithms under several numerologies (fixed, flexible and multiple). More precisely, the main contributions of this work are outlined below: 1) We first re-formulate the problem of eMBB throughput maximization, introducing the URLLC conflicts minimization in the objective function. The novel concept of ''conflict'' captures the penalties occurring due to the fact that orthogonal multiple access (OMA) does not allow overlapping of resources; as a result, OMA scheduling incurs a large number of infeasible resource allocation combinations. To the best of our knowledge, our earlier conference paper [24] is the first in the literature introducing conflict-aware solutions for the problem at hand. 2) Next, we propose three conflict-aware, multi numerology radio resource allocation heuristics to maximize scheduling efficiency for URLLC, when coexisting with eMBB services. Three different functions of the i) average, ii) the instantaneous (placement specific), or iii) the aggregate conflict are used to normalize the throughput utility function and incorporate penalties, when increasing conflicts. We argue and showcase through extensive simulation results that employing the proposed utilities improves the performance of proposed algorithms in the literature, as this in [1]. 3) Subsequently, we depart on a completely different approach with a high accuracy and low computational complexity. We treat the scheduling problem as a specific instance of bin packing optimization, solved by minimizing the placements of URLLC services in the time-frequency resource grid; to this end, we propose to group the resource blocks in different categories with respect to URLLC demands. Within each category, we solve a knapsack maximization of the sum eMBB throughput. Our proposal builds on previous results in [24], [25] and is inspired by the refined-firstfit family of heuristics to solve bin packing problems. Simulation results show that the novel heuristic algorithm, of complexity N log(N ), provides a lightweight and near optimal solution to the resource allocation scheduling of URLLC, when coexisting with eMBB. 4) Furthermore, having clarified the importance of minimizing conflicts between different services, the utilization of NOMA schemes [26], [27] provide as a competitive candidate for interference management. NOMA allows the superposition of services, even at the mini-slot level by employing superposition coding at the transmitter and successive interference cancellation at the receivers [28], [29]. Although most works on NOMA utilize the aspect of increased spectral efficiency to showcase superiority with respect to OMA, we further provide strong motivation for adopting NOMA as a conflict mitigation approach in scheduling problems. An extensive set of numerical results, investigating 2-user and multiple-user NOMA's performance, for both fixed and flexible numerology, shows the significant gains in terms of sum eMBB throughput, when adopting NOMA in a flexible numerology setting.
The rest of this paper is organized as follows: Section II presents the resource allocation optimization problem along with an equivalent formulation as a conflict minimization problem. Conflict-aware heuristic algorithms are described in Section III, while the problem re-formulation when using NOMA is presented in Section IV. Section V presents numerical results showing the near-optimal performance of the proposed heuristics as well as the superiority of NOMA for URRLC and eMBB coexistence, both in the case of flexible as well as fixed numerologies. Finally, conclusions are given in Section VI.

II. PROBLEM FORMULATION
We first provide a review of basic concepts in 5G NR flexible numerology and detail the considered scheduling problem.
A. BACKGROUND ON 5G NR FLEXIBLE NUMEROLOGY 5G NR Release-15 [4] defines a flexible numerology with subcarrier spacing (SCS) of 15, 30, and 60 kHz below 6 GHz, and 60 and 120 kHz above 6 GHz, compared to long-term evolution (LTE) which uses a fixed numerology with SCS of 15 kHz below 6 GHz. 5G NR also defines a 10 msec frame, with each frame divided into 10 subframes of 1 msec, which are further divided into one or more mini-slots. A minislot comprises 14 OFDM symbols for a configuration using normal cyclic prefix, or 12 OFDM symbols for extended cyclic prefix.
In 5G NR, the mini-slot size is defined according to the symbol duration, which is inverse to the SCS, to ensure the orthogonality of the subcarriers. By using higher SCS, the symbol duration decreases and hence also the mini-slot size, which is beneficial for lower latency [10]. URLLC traffic requires extremely low delays, often lower than 1 ms [30]. The URLLC latency requirements can only be satisfied if the transmission duration and round-trip-time (RTT) are shorter than the corresponding latency constraint.

B. SCHEDULING PROBLEM FORMULATION
We focus, in this work, on downlink scheduling, similarly to the majority of the existing related works in the literature, e.g., [13]- [23]. The system model and the scheduling framework follow the structure of [1], [8]. Time is divided into slots, with 2 msec slot duration. Each slot serves both throughput hungry (eMMB) and ultra-low latency users (URLLC), which have to be serviced until the next slot. In addition, the latter have also to satisfy specific latency requirements. Moreover, we assume that URLLC arrivals follow a Poisson distribution with parameter λ, while two cases are considered for the eMBB services: i) full buffer model, and, ii) |K (c) | = |K| − |K ( ) |. The objective is to find the resource allocation in each slot that maximizes the sum throughput of the eMBB while satisfying the throughput demands and latency constraints of the latter. Finally, we utilized [2] as a tool to implement the time-frequency grid.
The terminology employed in the rest of the paper is tabulated in Table 1: K denotes the set of all services, K (c) the set of eMBB users, K ( ) the set of URLLC users, B is the set of all possible resource blocks according to the numerology employed and I denotes the set of all mini-slots. Moreover, the parameter q k denotes the throughput demand for the k ∈ K ( ) services that has to be satisfied with a strict latency tolerance requirement of τ k . Additionally, we introduce the binary parameter α b,i , b ∈ B, i ∈ I which indicates whether a block b ∈ B includes basic unit i ∈ I, in which case α b,i = 1, otherwise α b,i = 0.
In Table 2 we describe the most widely utilized resource block specifications for 5G NR, depicted in Fig. 1(a) and (b); resource blocks of shape 1 shown in red, resource blocks of shape 2 shown in yellow and resource blocks of shapes 3−4 shown in blue (employing flexible numerology, K (c) and K ( ) can utilize any of the given shapes). To demonstrate the concept of conflict, in Fig. 1(a), we illustrate in gray shade the invalid placements for shapes 3-4 when a specific placement of shape 1 has taken place, while in Fig. 1(b) we show the invalid placements for blocks of shape 2, when an additional placement of shape 3-4 has been decided.  [4]. Now, we can define the achievable throughput of each block b ∈ B assigned to service k ∈ K, denoted by r b,k , which depends on the signal to interference and noise ratio (SINR) and the configuration of the block (including the parameters in Table 2). More precisely, we first define the achievable Shannon rate of each minislot i ∈ I assigned to service k ∈ K, as follows, where N is the total number of subcarriers. Accounting for the impact of the cyclic prefix, the rate per minislot is given by, where η j = T j T j +T cp , T j is the symbol duration of the j ∈ {1, 2, 3, 4} block shape and T cp is the cyclic prefix (CP) length. Then, the achievable throughput of each resource block b ∈ B with respect to service k ∈ K can be expressed as follows, where TTI j and SCS j are the transmission time interval duration and the subcarrier spacing of the j ∈ {1, 2, 3, 4} block shape, respectively. Additionally, 1 {x} is the indicator function for the logical proposition x. Note that the delay constraint is incorporated in the problem by considering that the end time t b of the block b ∈ B has to comply with the delay tolerance τ k of the service k ∈ K, otherwise the specific block is infeasible for the specific service. As such, the latency tolerance constraints of the URLLC services need not to appear explicitely in the problem formulation presented in the following. In the rest of the paper, by x b,k we denote a binary variable that takes the value 1 if the resource block b ∈ B is assigned to service k, otherwise x b,k = 0.
A common objective in eMBB and URLLC coexistence is articulated in maximizing the sum throughput of K (c) services under the constraint of satisfying the latency and throughput demands of K ( ) , without any overlapping between the allocated resource blocks. In other words, our goal is to find the resource allocation that satisfies the URLLC users' demands, with minimal losses for eMBB users in terms of throughput, and, subsequently schedule all the remaining resource blocks VOLUME 10, 2022 to the eMBB services. The general problem formulation is given as follows: b∈B k∈K In [1] it was proven that the combinatorial problem P0 is an NP-hard partition problem and a heuristic algorithm was proposed, referred to, in the rest of the paper, as the baseline heuristic, which uses a utility matrix u with elements u b,k that represent the utility of a block b ∈ B assigned to a specific service k ∈ K. Then, in the first step of the heuristic algorithm, the block b is allocated to service k ∈ K ( ) with the maximum u b,k ; notice that choosing the allocation that maximizes the utility without at the same time examining the ''cost'' of this placement in terms of generated conflict is clearly sub-optimal. The step is iterated until all the demands for k ∈ K ( ) are satisfied under the constraint (5). Next, in the second step, the placements for k ∈ K (c) services are allocated, using a similar principle, until no other non-overlapping blocks have remained. Hence, the placement of the K ( ) and K (c) has been treated as two separate resource allocation problems. The complexity of the baseline heuristic algorithm was shown to be O(|B||K| log(|B||K|)), without accounting for the computation of utility matrices.
The baseline heuristic has been extended in [1] to incorporate other utility matrices denoted by u LP , u LD ∈ R B×K , where u LP and u LD denote the optimal solutions of the linear programming (LP) and the Lagrange dual (LD) relaxation of P0, respectively. With these two new utilities, an extension of the baseline heuristic was proposed to calculate concurrently the solution of the heuristic algorithm by adopting both u LP and u LD utilities and retaining the best result between them; this allowed to reach a near-optimal performance, at the cost of high computational complexity, especially considering that the dual problem P0-LD also applies a sub-gradient method.
Discussing the above approach, whose basic principle (with few variations) can be found in other published work, e.g., [8], we notice that despite the fact that the overall aim is to jointly maximize the throughput of K (c) while meeting the demands of K ( ) services, these two interwoven goals are treated separately; in order to satisfy constraint (5), first the demands of URLLC services are met and then the placements of eMBB services take place.
Such policies solve P0 by accounting only for constraint (5), which is suboptimal as they do not consider the impact of the K ( ) services allocation to the consequent allocation of the K (c) services, i.e., constraint (6). We notice that previously proposed algorithms operate on a single optimization target at any instance, that of maximizing first the URLLC throughout and then maximizing the eMBB throughput. Building on this observation, we will first show that the previously presented baseline heuristic can be improved, if the conflict is taken explicitly into account.
To this end, we introduce an explicit description of the impact that the assignment of any resource block to a specific service has on the feasible assignments of the remaining blocks. In other words, we account for the amount of generated conflict by any specific URLLC or eMBB resource block placement. To evaluate the impact of constraint (6) explicitly, we define the conflict as for b, p ∈ B. As a next step we note that, where R total denotes the maximum sum throughput of the whole slot with respect to K (c) and the second triple sum represents the losses in K (c) throughput, because of the conflicts generated by the placements of all services. Given that R total has a specific value (that can be explicitely evaluated) for any particular slot realization (based on the specific channel realizations), the maximization of (4) is equivalent (from an optimization point of view) to the minimization of the aggregate conflict, i.e., max Hence, the maximization of the sum eMBB throughput may be reduced to the minimization of the potential conflicts. We also note that: where E[·] denotes expectation over the channel realizations, C is the set of conflicts when all resource blocks have the same average throughputr = E r b,k and |·| denotes cardinality; i.e., from (6) and (7) it emerges that we need, on average, to minimize the number of conflicts.
Considering these remarks, we propose novel heuristic algorithms for P0, focusing on minimizing the number of placements of K ( ) services. The first set of heuristics, dubbed in the following as conflict-aware greedy, use ''conflict'' enhanced variations of the utility proposed in the baseline heuristic and aim at closing the optimality gap. The second approach is built on an interpretation of (6) as a bin packing optimization problem [31]; based on this approach we develop a lightweight scheduling approach that is shown to be near-optimal.
Furthermore, as the minimization of conflicts is shown to be an equivalent optimization objective to the sum throughput maximization, we propose the use of NOMA to allow for overlapping of placements. The proposed heuristics and NOMA approaches are detailed in the next two sections.

III. HEURISTIC ALGORITHMS FOR CONFLICT RESOLUTION A. CONFLICT-AWARE HEURISTIC SOLUTIONS
We first propose extensions of the baseline heuristic, in [1], [8], etc., by introducing penalties in URLLC resource allocations, expressed as functions of the conflict. To this end, we introduce two metrics for the conflict induced by K ( ) services allocation. The aggregate conflict C t b , that measures the total number of overlapping blocks with the block b, and, the average conflict C r b,k , that corresponds to the average throughput -for every service k ∈ K ( ) -of the blocks p ∈ B that overlap with block b ∈ B.
Using these new conflict measures, we propose three variations for the utility matrix to be used in solving P0: • In the first version the utility becomes, • In the second variation, the utility becomes, • Finally, in the third variation, we use the following utility, The utility matrix u last pl. b,k is introduced to incorporate a ''compromise'' between the baseline and the conflict-aware approaches; notably, it considers the impact of the conflict only in the last K ( ) service placement, since our simulations revealed that in this last placement, usually, more blocks are required to satisfy the demands constraint.
In Algorithm 1 we outline, in the form of pseudocode, the proposed conflict-aware heuristic. Algorithm consists of two phases: i) in Lines 1-10 we decribe the allocation of the URLLC services, and, ii) in Lines 11-14 the allocation of the eMBB services is described. First, in Line 2, we denote the set G, consisting of the resource blocks to be allocated for the URLLC services (initially G = ∅). Set G is augmented with the couple (b , k ) that maximizies the conflict utility metric (u total , u avg and u last pl. ) across all the available b ∈ B and k ∈ K ( ) , described in Line 3. A service Algorithm 1 Conflict-Aware Resource Allocation Algorithm (CA) Based on [1] Input: u ( ) = [u b,k ], b ∈ B, k ∈ K ( ) , utility matrix for K ( ) (u total , u avg or u last pl. ) and u (c) = [r b,k ], b ∈ B, k ∈ K (c) , utility matrix for K (c) . Output: Block-service assignment G.
Phase (K ( ) resource allocation): 1: repeat 2: Remove from B the blocks in G and the overlapping with G blocks. 3: if q k is met then 5: The demand of the remaining users in K ( ) cannot be met. 10: end if Phase (K (c) resource allocation): 11: repeat 12: Remove from B the blocks in G and the overlapping with G blocks. 13: 14: until B = ∅ k ∈ K ( ) is satisfied when its q k demand is met and the first phase is concluded when all services are satisfied or no other available b ∈ B exist, according to Lines 4-6 and Lines 7-10, respectively.
In case all URLLC services are satisfied, algorithm proceeds to the second phase. From the available blocks B we exclude the resource blocks used for the URLLC services and all the overlapping to these blocks (Line 12). Finally, the eMBB services are allocated iteratively, according to the utility metric u

B. HEURISTIC INSPIRED FROM BIN PACKING OPTIMIZATION
In the standard bin packing problem formulation, the goal is to find the optimal placement of items of different volumes in the minimum number of containers (bins) of fixed volume [31]. Although the bin packing is a combinatorial NPhard problem, due to it's widespread encounter in a large number of settings, various proposed heuristics have been reported in the literature with different optimality gaps. Here, we propose a novel, computationally efficient scheduling approach, inspired by the refined-first-fit heuristic for the standard bin packing problem.
The proposed scheduling heuristic that accounts for conflicts is summarized in Algorithm 2, jointly minimizing the number of K ( ) resource allocations (placements) and throughput losses for K (c) users. Allocation of resources to K ( ) and K (c) services is treated sequentially but still in an if |Cat i U k | ≥ i then 10: 12: Remove from B the blocks in G and those overlapping with the blocks in G; 13: end if 14: if q k is not met then 15: Demand of k ∈ K ( ) can be satisfied 16: end if 17: until q k is met or i = H 18: end for Phase (K (c) resource allocation): 19: repeat 20: Remove from B the blocks in G and those overlapping with the blocks in G; 21: interwoven approach, with URLLC being served first to meet the latency requirements. In the following, we denote by vector e the aggregated throughput losses for each allocation of a block b ∈ B, i.e., The proposed heuristic contains three steps: i)generation of bin-packing categories, in Lines 1-6, ii) URLLC resource allocation, in Lines 7-18, and, iii) eMBB allocation, Lines 19-22. First, for each k ∈ K ( ) we generate H categories (bins) with decreasing fractional sizes with respect to q k , k ∈ K ( ) . Category i ∈ {1, . . . , H } is defined as the set of all resource blocks b ∈ B for which the ceiling of the service demand ratio over the throughput of block b is equal to i, or equivalently, category Cat i U k contains the available resource blocks which satisfy at least 1/i-th of the service demand q k . Formally, we define, The above operation is described in Lines 1-3. Afterwards, in Line 4, we remove the overlapping blocks with the higher aggregated throughput loss (within each category), using (13). In other words, Cat 1 U 1 is the category of the blocks which individually satisfy the whole demand of the URLLC service k = 1. Therefore, the categories created for service k ∈ K ( ) range from Cat 1 U k -containing the most valuable blocks (valuable in terms of throughput r b,k ) -till Cat H U k , containing the least valuable blocks in order. Note that i) we need at most i elements from Cat i U k to satisfy the demand q k of service k ∈ K ( ) ; ii) categories might be empty, so H needs to be defined according to the expected throughput per mini-slot, as well as its variance.
Next, we consider the allocation of the URLLC services. For each k ∈ K ( ) we select the first category Cat i U k with elements at least equal to i ∈ H , Lines 7-9. In this category, we subsequently introduce a further minimization problem in order to select the elements from each category that incur the minimum loss to eMBB, i.e., min Note that if (15) is interpreted as a knapsack problem, each element of a given category has the same weight (equal to unity), while the values (losses in the specific instance) differ. Similar problems are encountered in different settings, e.g., the subcarrier resource allocation in [25]. Exploiting these previous results, we reproduce a simple heuristic according to which the elements of each category are re- in increasing aggregated loss e b , b ∈ B, Line 10. Subsequently, the first i elements of category Cat i U k are allocated to URLLC, Line 11. After each allocation, the allocated blocks are removed from B and all other categories, Line 12. The procedure is repeated until the demand q k , k ∈ K ( ) is satisfied or no more categories exist for the specific service k, Line 17; in this last case solution is infeasible, Lines 14-16.
As an example, after this step, the first element of Cat 1 U k is the resource block that can simultaneously cover the demand q k of URLLC service k while incurring the least aggregate losses for the eMBB users. The joint minimization of the number of K ( ) placements and the losses due to conflicts is achieved simply by assigning to service k ∈ K ( ) the first i elements of Cat i U k , starting from i = 1, i.e., the allocation for demand q k starts from Cat 1 U k . As explained before, the most valuable categories in terms of throughput satisfy URLLC services by using the least number of resource blocks and result in the minimum number of K ( ) placements, that is expected on average to incur the minimum losses due to conflicts. Furthermore, having re-ordered the elements of each category in increasing eMBB loss value, we jointly account for both constraints (5) and (6) in one go.
In the last phase of the algorithm, Lines 20-22, the resource allocation to K (c) services takes place. This is performed by selecting the block-service pairs with the highest throughput r b,k , b ∈ B, k ∈ K (c) ) from the remaining available blocks. The latter have not been allocated to a URLLC service, since once a block is allocated it is removed from B. This step is iterated until no more blocks remain available.
Finally, we also consider a modified version of the bin packing based heuristic (mBP), targeting on challenging time-frequency grids, where infeasibility is the major issue. In this case, we introduce a pre-processing step to check the feasibility of each slot. We first count the total throughput of all available block placements and compare with throughput resulting from the placement of all the available blocks for the URLLC services, in both cases with respect to the con- where b ∈ B are the blocks that satisfy constraint (3) and δ ∈ (0, 1), instead of using the e b metric for the allocation of the k ∈ K ( ) services we switch the metric to e b = max r b,k , k ∈ K ( ) , in order to ensure the URLLC's services allocation.

IV. NOMA FOR DOWNLINK SCHEDULING
In this section, we re-examine P0 under the assumption that it is possible to employ NOMA in the downlink to schedule different services, even at the mini-slot level [27]. We extend our analysis to the NOMA approach in order to discuss the potential gains that stem from the avoidance of conflicts, due to the superposition of services into the same resource block. NOMA has in the past been proposed as a competitive scheme to enhance throughput per resource block [26]; here we further motivate for it's employment as the means to mitigate conflicts in the allocation of resource blocks by allowing superposition of users, puncturing and preemptive scheduling can be avoided.
First, we consider the multiple NOMA (mNOMA) scenario where multiple users may share the same resource block. Therefore, P0 is reduced to a linear programming (LP), since this scheme allows overlapping amongst blocks, either fully or partially (on some of the minislots of the resource block). We refer to the corresponding optimization problem as P1, noting that the optimization variable is now a real number x b,k ∈ [0, 1] indicating the percentage of block b ∈ B assigned to the service k ∈ K, b∈B k∈K a b,i x b,k ≤r, i ∈ I. (18) r ≥ 1 denotes the normalized sum throughput per block achieved with NOMA [32]. Note that the corresponding OMA constraint (6) is upper bounded to unity, pointing out a further gain in using NOMA due to the increase in per resource block utilization. However, as in this work we aim primarily at demonstrating the gains brought about due to conflict avoidance, in the numerical results presented in Section V we simply user = 1.
A known issue of mNOMA is that error cascades in decoding can compromise performance; to alleviate such effects NOMA with user pairing has recently gained a lot of attention. In this framework, we implement the 2-user NOMA (2u-NOMA) scheme, since this approach provides lower decoding complexity, shorter delay and higher reliability in comparison to mNOMA [33]. Considering the power allocation problem in a downlink NOMA 2-user system, it has been proven in [33] that the achievable rate of the 2u-NOMA user with lower channel gain is equal to that of the OMA user when the power allocation is optimal and the remaining power is allocated to the strong 2u-NOMA user.
In contrast to the scheduling optimization problem as formulated in P1, 2u-NOMA allows overlapping amongst at most two blocks, either full or partial (of some mini-slots). In light of this, P1 is reformulated to a mixed integer programming (MIP) problem, by adding the supplementary binary variable y b,k ∈ {0, 1} to indicate weather block b is assigned to service k. To construct y b,k , we let * . be the operator that maps a real number x to the smallest integer greater than or equal to x. The new MIP problem, referred to as[P1] follows, b∈B k∈K b∈B k∈K The additional constraint (23) ensures that at most two overlapping blocks are allowed per mini-slot. Furthermore (24) forbids the overlapping URLLC resource blocks in order to avoid induced overheads in decoding. Finally, (25) ensures that if URLLC is overlapping with an eMBB resource block, the throughput of the eMBB resource block is higher. VOLUME 10, 2022 (20) and (21) guarantee that the URLLC is always the weak user and thus, in the downlink NOMA, has to be decoded first [33], so that no extra latencies are introduced for URRL users due to the NOMA decoding order.

V. NUMERICAL RESULTS
In this section, we present numerical results for both OMA and NOMA schemes, for different 5G URLLC configurations and numerologies; fixed, multiple-fixed and flexible numerology. This exercise allows us to highlight the importance of flexible numerology, while motivating NOMA as a conflict mitigation approach. Here, we mainly focus on the conflicts aspect, rather than on deployment, feasibility or coordination issues, which are important enough to deserve an independent study. We then move on to a comparative analysis of the proposed heuristic Algorithms 1 (conflict aware, CA) and 2 (bin packing based, BPB) for OMA, to provide proof-of-concept for the potential of the proposed conflict aware scheduling. We use the simulation setup given in [1], implemented based on the control channel overhead model for supporting the flexible numerology defined in [34] and considering the effect of guard band (i.e., of the cyclic prefix) on the achievable data rate, as modeled in [35]. The computation of the throughput per block r b,k relies on the configuration of block b (see Table 2), with a total number of nine multipath channel profiles [36], calculating the throughput based on the model introduced in [37]; for URLLC users the throughput values incorporate the delay constraints so that non-zero throughput is available only in these block in which the delay constraint is met. The throughput model also considers intersymbol-interference (ISI) depending on CP, and approximates the inter-channel interference (ICI) between the neighboring subbands of different numerologies.
In detail, the simulation parameters are given in Table 3. We assume a time-frequency grid, where each slot relies on a 2 msec and 2 MHz domain (i.e., of dimensions 16 × 11). As a result, this produces, for each slot, a set of I = {1, . . . , 176} mini-slots and a corresponding set of B = {1, . . . , 549} candidate blocks with respect to the numerology, where every candidate block consists of 4 elements of I. The resource block details are given in Table 2. Blocks of shape 1 (corresponding to a shape of 4 × 1 minislots), B 1 ⊂ B, include a multitude of |B 1 | = 143 resource blocks. Blocks of shape 2 (2 × 2 minislots), B 2 ⊂ B, include a multitude of |B 1 | = 150 resource blocks. Blocks of shape 3 and 4 (1×4 minislots), B 3 , B 4 ⊂ B include the same multitude of blocks |B 3 | = |B 4 | = 128. Moreover, the chosen latency tolerance and bit rate demands for the URRLC users are τ = {0.5, 1, 1.5, 2} msec and q = {16, 32, 64, 128, 256, 512} kbits/sec (kbps), respectively. The latency tolerance for the eMBB users is fixed and equal to τ = 2 msec. The SNR range is generated by numbers uniformly distributed in the interval [5,30] dB. In the following, we refer to the ''optimal shceduling'' to denote the solutions provided by the Gurobi optimization solver, used as a benchmark for the evaluation of the optimality gap of the proposed heuristics.
Finally, we consider three scenarios for the arrival of the URLLC and the eMBB services on each slot: i) 5 URLLC and 5 eMBB constant users per slot, i.e., |K (c) | = |K ( ) | = 5, ii) 10 users in total per slot, where the arrival rate of the URLLC services is a random variable following the Poisson distribution, i.e., |K ( ) | ∼ Pois (5) First, we compare the performance of OMA and NOMA schemes for different numerologies. In the case of fixed numerologies, shape 1 (horizontal), shape 2 (square) and shape 3 (vertical) type of blocks are considered separately. Furthermore, capturing a common scenario in practical systems [5], we define as the multiple-fixed numerology the one in which eMBB uses resource blocks of shape 1 (horizontal) and URLLC of shape 3 (vertical). Finally, in the case of flexible numerology all type of shapes, given in Table 2, are available to all services. In this subsection, we consider for the users arrivals, |K ( ) | = |K (c) | = 5. In Fig. 2, the sum bit rate for the eMBB services, K (c) when applying the optimal i) OMA, ii) 2u-NOMA, and iii) mNOMA scheduling are shown. The NOMA sum bit rate gains with respect to the OMA are depicted with the lighter color in each bar. The latency tolerance and bit rate demands considered are τ = 1 msec and q = {16, 32, 64, 128, 256, 512} kbps, respectively, for five K ( ) and five K (c) users. In all cases, as expected, flexible numerology significantly outperforms the fixed and multiple-fixed numerology. Moreover, multiple-fixed overpasses the per-  formance of fixed numerology in the OMA case. From these results it becomes apparent that flexible numerology in combination with NOMA can offer distinct gains across varying URLLC demands. Notably, as the URLLC demands increase, flexible numerology is the only approach that avoids infeasibility issues, i.e., not servicing all of URLLC demands.
Focusing on the comparison between OMA and NOMA, both NOMA schemes consistently outperform OMA. More precisely, NOMA based scheduling is shown to increase particularly the sum throughput of eMBB users under fixed numerology, although NOMA also improves the overall performance when using flexible numerology as well. On the other hand, NOMA does not affect the performance under multiple-fixed numerology; this is due to the fact that in the specific grid used in the simulations, overlapping of blocks is limited in the case of multi-fixed numerology. Furthermore, the gains in using NOMA are more accentuated in lower URLLC demands. Finally, the gains of mNOMA are negligible compared to these of 2u-NOMA for the specific grid; especially for lower q k demands.
Furthermore, in Fig. 3, the normalized to NOMA performance gap between OMA and NOMA (expressed as a percentage) is shown, for different numerologies. The superiority of NOMA is reconfirmed both for fixed and flexible numerology, for different values of the URLLC latency tolerance τ k = {0.5, 1, 1.5, 2} msec, k ∈ K ( ) . Finally, in the case of flexible numerology, the lower the delay tolerance τ k , the higher the gains in using NOMA as opposed to OMA. The performance fluctuations, illustrated in Fig. 3, are strongly related to the different values of the bit rate demands q k , k ∈ K ( ) . More precisely, after a close inspection of the simulation outputs, we came to the conclusion that the gap between the demand of a service k ∈ K ( ) and the achievable throughput of the block, in which the service is allocated, plays an important role. A higher gap between the two corresponds to a decisive reduction of the overall available throughput for the scheduling of the K (c) services in the OMA case, which in turn offers a crucial advantage to the NOMA scheme that allows sharing of resource blocks.
In Fig. 4 examples of scheduling on the time-frequency grid is depicted in the case of OMA, 2u-NOMA and mNOMA, for τ k = {0.5, 1} msec and q k = {32, 256} kbps, respectively, for all k ∈ K ( ) . In the case of OMA and q k = 32 depicted in Fig. 4(a), sharing of resource blocks is not allowed, while, in the case of NOMA, depicted in Figs. 4(c), (e), the opportunity of sharing resource blocks increases the sum throughput for eMBB, i.e., all of the blocks assigned to URLLC services are shared with these assigned to the eMBB services. Notice also that total number of blocks assigned to the eMBB services, for both mNOMA and 2u-NOMA, are the same; even if mNOMA concludes to a higher number of sharing blocks.
Similar outcomes are depicted in Fig. 4(b), when using OMA, and Figs. 4(d), (f), when using NOMA, in which a higher value of q k = 256 is considered. In this case, though, not all of the blocks are assigned to eMBB services (for both NOMA scheduling) due to the higher URLLC demand.

B. PERFORMANCE OF PROPOSED HEURISTIC ALGORITHMS
Although NOMA clearly outperforms OMA, its use might be prohibited by a number of factors, including the need for multiple decoding steps and the impact of imperfect successive interference cancelation (SIC). As a result, the evaluation of OMA scheduling approaches is paramount. In this subsection, we discuss the proposed heuristics. As a validation step, we first evaluate and compare the optimality gaps of the baseline heuristic (presented in [1]) and the proposed conflict aware heuristics with utilities (u total , u avg and u last pl. ), denoted by CA(·) with input one of the corresponding utility matrices, against the global optimum of P0. Then, we provide additional results with all proposed heuristics employing flexible numerology. For the above experimental results we assume that the URLLC users follow |K ( ) | ∼ Pois(5) and the total amount of users in each slot is constant, |K| = 10. Fig. 5 depicts the optimality gap: i) of the baseline, the variations of the conflict-aware and the bin packing based approaches (first row), and, ii) of the LP-LD relaxation of P0 (second row), for several values of maximum sub-gradient iterations, with respect to the bit rate demand and the latency tolerance of the K (l) services.
In the first row of Fig. 5 the conflict-aware and bin packing based heuristics are shown, in most cases, to outperform the baseline heuristic approach, especially for higher latency tolerance values, see Fig. 5(b) and (c), and to provide similar results for lower latency tolerance values, Fig. 5(a). The only exception is CA(u lastpl. ) which outperforms the baseline approach only for τ = 1.5 msec. More precisely, CA(u total ), CA(u avg ) BP clearly outperform all other approaches, maintaining an optimality gap below to 10% for τ k = 0.5 msec and close to 7% for τ k = {1, 1.5} msec.
The second row of Fig. 5 depicts the optimality gap of the LP-LD heuristic solutions, for various threshold values M = {10, 20, 50} of the maximum sub-gradient iterations, against the global optimum. Incorporating the utility matrices u LP , u LD ∈ R B×K leads to similar results (not depicted for compactness). As it is expected, higher threshold values of M lead to a further reduction in the optimality gap, at the cost of a higher computational time. The choice of M = 10 results on very high optimality gaps, near to 20% in most cases. On the other hand for M = 20 and M = 50 the heuristics are shown to maintain the optimality gap close and lower than 10% for the chosen latency tolerance values. Note that the optimality gaps of the proposed heuristics are comparable to that of the LP-LD variations in all cases, as it can be seen by comparing the two rows of Fig. 5. On the other hand, the reduction of the optimality gap using LP-LD utility matrices comes with a significant increase of the computational time.
Furthermore, we utilize our implementation to quantify the performance of the optimal and the heuristic approaches, in terms of processing cost. The computational time is measured on a Lenovo IdeaPad 510-15IKB laptop, with an Intel Core i7-7500U @ 2.70 GHz processor and 12 GB RAM. In Fig. 6, we depict the processing cost of i) the optimal solution; ii) the baseline heuristic variations (without the usage of the LP-LD utilities); iii) the bin packing based approach; and, iv) the LP-LD heuristic with threshold value M = {10, 20}, for q k = {16, 32, 64, 128, 256} and a conventional latency tolerance value τ = 1 ms. As shown, the LP-LD solution is much more computational intensive than other heuristic approaches, even compared to the optimal solution. Note that higher threshold values increase drastically the processing cost, e.g., for M = 50 the processing cost is of 22 sec. On the other hand, the processing cost of the bin packing  and the conflict-aware heuristics is between 0.3 and 0.5 sec, noting that the complexity of the conflict-aware and bin packing based heuristics is of O = (N log N ). Based on the computational complexity comparison, we exclude from the next experiments the approaches including the calculation of the LP-LD utility matrices.
Finally, in Fig. 7 we illustrate the cumulative optimality gap over time, where the throughput demands and the latency tolerance of the URLLC services are randomly chosen (in each time instance) from the vectors q k = {0.016, 0.032, 0.064, 0.128, 0.256} kbps and τ k = {0.5, 1, 1.5, 2} msec, respectively. The results demonstrate the superior performance of the conflict aware and the bin packing based heuristics, i.e., the optimality gap converges to 6% for BP, CA(u total ) and CA(u avg ), while, the optimality gap of the baseline approach converges to 10%.  In the next set of experiments, we assume that the total number of the URLLC services follow the Poisson distribution, |K ( ) | ∼ Pois(5) and the eMBB services are constant per slot, |K (c) | = 5. Figs. 8 -10 show that the performance of the heuristic algorithms compares well with the global optimum (obtained through Gurobi solvers), while keeping the complexity very low. Note that the proposed algorithms, exceed the performance of the baseline heuristic, especially for τ > 0.5 msec, verifying the results of the previous experiments Figs. 5(a), 7. This showcases that indeed, the reformulation of the optimal scheduling as a conflict minimization problem is highly pertinent and allows shedding light on how to jointly address the constraints (5) and (6) of P0. It is also noteworthy that more elaborate heuristics could be proposed in the same context, by looking at algorithms with lower optimality gaps to the optimal bin packing solution.  The same conclusions can be reached in Figs. 11 and 12 for URLLC demands of 128 and 256 kbps, respectively. In these cases, all the conflict-aware choices exceed the performance of the baseline heuristic; the choice of CA(u lastpl. ) metric is the only one with lower performance to that of baseline heuristic for τ = 1 ms and for q k = 256 kbps and τ = 0.5 msec. Also, the CA(u total ), CA(u avg ) and the BP heuristics provide similar performance rates.
Finally, in Fig. 13 we illustrate the cumulative optimality gap over time, considering random q k and τ k values for each k ∈ K ( ) (according to Fig. 7), further assuming that |K ( ) | ∼ Pois (6) and |K (c) | = 5. The superior performance of the proposed heuristics is reconfirmed, especially for the BP and the CA(u avg ), CA(u total ) approaches, which significantly reduce the gap; converge to 6% when the baseline heuristic converges to 11%.

VI. CONCLUSION AND FUTURE WORK
In 5G and beyond networks, URLLC services will coexist with eMBB services, giving rise to challenging layer 2 scheduling. To address the latter, we have reformulated the standard eMBB throughput maximization problem as an equivalent conflict minimization, which points at minimizing the overall amount of conflicts. Building on this premise, two lightweight and efficient scheduling approaches were proposed; a family of conflict-aware heuristics that employ conflict aware utilities and a heuristic inspired by the bin packing problem. In addition to the proposed scheduling using orthogonal multiple access (OMA), we further proposed the use of non-orthogonal multiple access (NOMA) to mitigate conflicts. We investigated the potential advantages of allowing for non-orthogonal sharing of radio resources with flexible numerology and frame structure. The intuition for NOMA's superior performance, as a result of alleviating conflicts, was demonstrated to hold; importantly, NOMA can potentially offer significant advantages particularly in the case of ultra-low latency constraints for the URLLC users. Extensive simulations were performed for URLLC services with different QoS requirements both for OMA and NOMA scenarios. The simulation results showed that i) all of the proposed heuristics have near-optimal performance, demonstrating that conflict minimization is indeed key to layer 2 scheduling and that there are significant gains in terms of resource utilization, when employing NOMA.
In the future work we will extend the existing approach targeting on a wider range of performance metrics for the eMBB users, e.g., minimum expected achieved range and fairness. In this framework, we would also consider scheduling schemes for the uplink. VOLUME 10, 2022