Intelligent Resource Management for eMBB and URLLC in 5G and Beyond Wireless Networks

In the era of 5G and beyond wireless networks, the simultaneous support of enhanced Mobile Broadband (eMBB) and Ultra-Reliable Low Latency Communications (URLLC) poses significant challenges in managing radio resources efficiently. By leveraging the puncturing technique, we propose an intelligent resource management framework for meeting the strict latency and reliability requirement of URLLC services and the high data rate for eMBB services. In particular, a semi-supervised learning and deep reinforcement learning (DRL) based architecture is proposed to manage the resources intelligently. We decompose the optimization problem into two subproblems: 1) resource block allocation (RBA) strategy for eMBB slice, and 2) URLLC scheduling. Through extensive simulations and performance evaluations, we demonstrate the effectiveness of the proposed technique in optimizing resource utilization, minimizing latency for URLLC users, and maximizing the throughput for eMBB services. Simulation findings demonstrate that the proposed methodology can ensure the URLLC reliability requirements while maintaining higher average sum rate for eMBB and higher convergence rate. The proposed framework paves the way for the efficient coexistence of diverse services, enabling wireless network operators to optimize resource allocation, improve user experience, and meet the specific requirements of eMBB and URLLC applications.


I. INTRODUCTION
T HE 5 th Generation (5G) network is revolutionizing hu- man lives by stretching the performance bounds of mobile networks to support a variety of use cases.The industrial network has become more interesting due to the increasing demand for digitalization, and there is a huge prospect of its growth in years to come.However, due to the heterogeneity of the network, it is challenging to implement different types of services on existing networks.There are different types of requirements for different types of services, which include control for precision manufacturing and automation as it needs to meet the criteria of reliability and latency [1].This makes resource allocation more challenging in meeting the requirements of these services due to the limitations of traditional networks [2].Next-generation wireless networks (NGWN) can overcome these issues and provides a flexible environment where resources can be managed intelligently at a low cost.According to ITU Radio communication Sector (ITU-R), 5G is categorized into three services termed enhanced Mobile Broadband (eMBB), Ultra-Reliable Low-Latency Communications (URLLC), and massive Machine-Type Communications (mMTC) [3], [4].The purpose of eMBB services is to serve high data rates applications such as augmented reality (AR), virtual reality (VR), and ultra high definition (UHD) video with tolerable reliability [5].On the other hand, URLLC services focus on higher reliability and low-latency communications by transmitting shorter packets in length with a packet error rate (PER) in the range of 10 −6 .It covers mission-critical applications such as vehicular communications, remote health services, and industrial automation.Packets are transmitted at shorter transmission time intervals (TTI) to meet the requirements of low latency.In Long Term Evolution (LTE) systems, latency is higher because control messages occupy a large part of the transmission.So, At the physical layer of 5G new radio (NR) systems, certain modifications are proposed to fulfill the requirements of URLLC [6].Whereas mMTC's objective is to accommodate a massive number of Internet of Things (IoT) devices where each device can communicate with each other and the base station (BS) at a low data rate.
Generally, eMBB and URLLC services are mostly discussed in 5G networks [7].We have considered these two services in our paper and proposed a novel approach to optimize their performance.Whereas, the pre-5G architecture does not support these two services [8].The latency and reliability issues can be overcome by leveraging the concept of network slicing (NS) [9].In NS, a single common physical channel is partitioned into multiple logical sub-networks, where each logical sub-network has its dedicated channel [10].Better utilization of the resources can be achieved through logical separation and virtualization, which makes NS more flexible.Each logical sub-network has its radio access approach, and network virtualization functions (NVF).Radio access network (RAN) slicing is a type of NS that focuses on the RAN portion of the network.It allows operators to provide dedicated logical networks with customer-specific functionality without losing the economies of scale of a common infrastructure.The 5G RAN slicing helps operators manage the RAN resources needed for NS to operate.RAN slicing enables radio resources to be sliced in different ways, such as allocating different physical resource blocks (PRB) to different network slices.
In RAN slicing, resource management can be challenging due to the heterogeneity of the network.There are limited available radio resources to meet the requirements of URLLC and eMBB slices.In URLLC, packets need to have a short symbol length to meet the low-latency requirements.To meet the low-latency requirement, one option is to transmit the packet by reducing its symbol period.However, this approach is only suitable for mm-Wave bands because of the less delay spread due to smaller cell size [11].Another possible method involves utilizing mini-slots with shorter TTIs by reducing the number of symbols to a minimum [11].Slicing can be implemented on the RAN and core network (CN) parts.In this paper, we focus on the RAN part by intelligently allocating the resources to each user equipment (UE) according to the user demand.Note that the dynamic assignment of resources combined with the rapid user demands poses a formidable challenge.We utilize the above-mentioned features to design an intelligent and effective approach to overcome the resource allocation problem of URLLC and eMBB services.The development of a dynamic framework of radio resource allocation in RAN slicing has become a primary focus for researchers [12], [13].The heterogeneous traffic in the network requires an optimal resource management approach to meet the quality of service (QoS) requirements.The URLLC service cannot be held back due to its stringent low-latency requirement.The arriving URLLC traffic needs to be given priority over any ongoing eMBB transmission.To achieve this aim, two schemes have been proposed in the 3GPP standard [14]: 1) puncturing, and 2) orthogonal scheduling.In puncturing, to meet the latency requirement, BS ends the ongoing eMBB transmission and URLLC packets are scheduled in mini-slots over the already scheduled eMBB transmission.Through puncturing, low-latency requirements can be achieved, but it can affect the capacity and reliability of the system.Whereas in orthogonal scheduling, frequency channels are withheld before the URLLC transmission.The drawback of this scheme is that in case of no URLLC traffic the frequency channels reserved for URLLC transmission will not be utilized, which results in wastage of resources.
In this paper, we evaluate the puncturing technique to manage the radio resources in NS.As stated earlier, the instantaneous scheduling of URLLC transmission, which interrupts the eMBB traffic, has a significant impact on both system capacity and reliability.Furthermore, it leads to a degradation in the performance of the eMBB service.Hence, we develop an optimization-based approach to address the resource allocation problem, where it is important to not just focus on maximizing the capacity of the system, but also consider the reliability of the URLLC and eMBB services.

A. CONTRIBUTIONS & CHALLENGES
It is challenging to handle the co-existence of eMBB and URLLC services over a common physical resource.In this paper, we get the better of this issue by using the puncturing technique.We proposed an efficient framework to ensure the capacity and reliability of the system while meeting the low latency requirement.This study addresses not only the problem of maximizing eMBB rates while meeting latency requirements but also investigates the influence of URLLC traffic on system capacity and reliability.There are some constraints associated with URLLC services due to the fact that URLLC-based services are optimized to operate independently of other services.URLLC systems are often optimized in a standalone manner, meaning that they are designed and implemented without considering other systems that may be operating in the same environment.It is challenging for this approach to manage the URLLC transmission characteristic in a dynamic network environment.Furthermore, in the worst case, it can break the URLLC reliability constraints in order to get the optimal solution of the optimization problem, which can affect the QoS requirements.Due to the heterogeneous environment, randomness, and stringent requirements of URLLC traffic, the radio resources need to be allocated intelligently.ML-based algorithms such as semi-supervised and DRL can solve complex optimization problems in realtime in order to allocate the resources intelligently [15].We apply the co-training method of semi-supervised learning in the strategy.In the existing co-training approach, a predefined policy fails to consider the sampling bias of the chosen samples between the labeled and unlabeled data samples [16].Existing works related to resource management based on ML are largely dependent on labeled data samples.Though unlabeled data samples can be generated easily it requires a complex computational process to obtain the output of each data sample.Thus, we propose a novel framework to improve learning ability.In certain scenarios, acquiring labeled data for training DRL models can be challenging due to data sparsity.The Co-training DRL (CDRL) approach can mitigate this issue by leveraging a combination of labeled and unlabeled data.The labeled data can provide valuable information for training the model, while the unlabeled data can aid in discovering hidden patterns and improving generalization.This can be particularly beneficial in the context of eMBB RB allocation, where obtaining a large amount of labeled data might be difficult.In this work, we propose a CDRL approach to address the eMBB resource allocation problem.We present a novel CDRL approach based on q-learning, to improve the policy by choosing the unlabeled samples after taking the action at each TTI.We generate the labeled data through a two-sided matching technique, and use DRL with a semi-supervised based co-training method to predict the resource block for each user associated with eMBB slice.The implementation of DRL in URLLC poses challenges due to the stringent latency, and reliability requirements.Slow convergence of DRL can also be an issue in the implementation.We have considered all these challenges in our work and proposed a novel framework incorporating optimizationbased techniques with semi-supervised learning and DRL to enhance the resource allocation capabilities for eMBB and URLLC traffic in 5G and beyond wireless networks.In this work, our key contributions are: • Firstly, the resource allocation problem is expressed as an optimization-based problem, where we aim to maximize the sum rate of the eMBB service while fulfilling the URLLC constraints.• Secondly, we decompose the problem into two subproblems, consisting of eMBB resource block allocation strategy, and URLLC scheduling.Each sub-problem is treated separately depending on its framework in order to obtain the optimal solution.• In the eMBB resource block allocation strategy, we propose the CDRL approach for resource block allocation, where we use DRL with co-training to predict the resource block for each user.To learn the best sample selection policy in co-training, we propose a q-learning approach, which utilizes the policy to train the model.• In the URLLC scheduling sub-problem, we present a DRL-based DDQN approach to meet the latency and reliability requirements and to intelligently manage the URLLC traffic over the punctured eMBB slots.We propose the DDQN approach based on Thompson sampling to overcome the problem of slow convergence.• Finally, we evaluate the performance of the proposed schemes.Simulation results demonstrate that the proposed methodology can ensure the URLLC reliability requirements while maintaining higher average sum rate for eMBB users.
Given the differing requirements of eMBB and URLLC, it is challenging to optimize resource allocation simultaneously to satisfy both types of services.The high data rate demands of eMBB may lead to increased latency and reduced reliability for URLLC services if resources are not allocated efficiently.
Conversely, prioritizing URLLC requirements may result in under-utilization of resources and lower data rates for eMBB users.Optimizing each layer individually allows for finetuning and maximizing the performance metrics specific to that layer, enabling better performance for both eMBB and URLLC users.By customizing the optimization process, it becomes possible to enhance the performance metrics relevant to each layer, leading to improved outcomes for both types of users.By employing different DRL algorithms customized for specific traffic types, resource allocation efficiency can be enhanced.In this work, our aim is to propose an approach that can effectively converge to near-global optimal solutions or provide satisfactory performance in practical settings.Because a resource allocation in network slicing is a complex and multi-dimensional optimization problem.It involves numerous variables, constraints, and objectives.
The solution space can be vast and non-linear, making it difficult to analytically derive global optimal solutions.The problem complexity and the presence of local optima can limit the assurance of reaching the global optimum.However, combining CDRL with DDQN and Thompson sampling to solve the coexistence problem of eMBB and URLLC users provide a way to leverage the strengths of each algorithm to achieve near-global optimal solutions.We have organized this paper in the following manner.In section II, we review the related work before introducing our system model in section III including the URLLC data rate and eMBB data rate after puncturing.Further, the problem formulation is presented in section IV.Then, we present the proposed resource block allocation (RBA) strategy in section V, and an intelligent URLLC scheduling framework based on deep reinforcement learning (DRL) is presented in section VI.In section VII, the simulation results of the proposed algorithms have been presented.Section VIII presents the conclusion of the paper.

II. RELATED WORK A. URLLC & EMBB REQUIREMENTS
Extensive research work on the RAN resource management approach is being carried out in both industry and academia.Mainly, it focuses on how to develop an effective RAN resource management approach, and how to address the issues related to it.In [17], the authors presented a slicing approach for the LTE network to manage the resources efficiently, so the services can be provided to different mobile network operators (MNOs).A slicing and scheduling approach has been proposed in [18] to ensure services by allocating resource blocks (RBs) to each virtual network.For a single-cell orthogonal frequency-division multiple access (OFDMA) network, the authors proposed an effective sub-carrier and power allocation approach in [19].Due to the limitations of the aforementioned works, it cannot meet the QoS requirements of the NGWN.With the advancement of applications, 5G systems needs to support a massive number of devices by meeting the strict low-latency requirements.In [6], the authors mentioned the main URLLC requirements and also highlighted its issues at the physical layer.In [20], the authors showed that overlapping URLLC traffic over eMBB transmission after every mini-slot can significantly improve the performance of a system in terms of resource efficiency.For the design of URLLC, the authors have discussed theoretical aspects such as massive MIMO, and medium access control (MAC) protocols in [21].In [22], the constraints of URLLC have been discussed and future research direction of URLLC was given for the NGWN and termed eXtreme URLLC (xURLLC).To avoid transmission delay, the blocklength in URLLC should be finite.Whereas, Shannon's capacity theorem is applicable when blocklength is infinite.In [23], authors have analyzed the resource management problem for URLLC service given the achievable data rate in the context of finite blocklength.The optimization problem focuses on optimizing the power allocation and bandwidth allocation subject to the reliability and latency constraints.
In [24], the authors have proposed an approach for Vehicleto-Vehicle (V2V) networks based on an optimization problem that aims to reduce the power subject to latency and reliability limitations.Here, they applied the extreme value theory and defined the reliability measure with regard to the maximum queue length paired between vehicles.The work in [25] evaluated the joint optimization of the V2V communications, where it aims to optimize the radio resources, modulation schemes, power control, and increase the capacity of cellular users while ensuring the stringent requirements of vehicle users in terms of latency and reliability.To solve the joint optimization problem and to reach the optimal solution, the authors have used binary search methods and Lagrange dual decomposition.The study conducted in [26] proposed an approach based on concurrent scheduling of URLLC and eMBB traffic, with the objective of maximizing the capacity available to eMBB users while simultaneously ensuring compliance with stringent latency and reliability requirements.The authors discussed the effect on eMBB service due to the incoming URLLC traffic.
The authors in [27] presented a proportional fairness scheme, where radio resources are allocated to incoming URLLC transmission while guaranteeing the reliability requirements of eMBB and URLLC services.In [28], the authors discussed eMBB and URLLC transmission services in terms of cloud RAN, where multi-cast and unicast transmissions are marked for eMBB and URLLC services, respectively.A general revenue-based maximization problem was presented as mixed-integer nonlinear programming for RAN slicing.The work in [29] proposed an approach for eMBB and URLLC services to find the optimal policy to the resource scheduling problem.For multiplexing of eMBB/URLLC traffic, authors in [30] studied the orthogonal multiple access (OMA) and non-orthogonal multiple access (NOMA) schemes and discussed the trade-offs between them.The results are simulated with different decoding schemes such as puncturing and successive interference cancellation (SIC), and it shows that the OMA minimizes the interference among eMBB and URLLC traffic, but degrades the performance of the URLLC service.Whereas, NOMA with SIC scheme improves the URLLC performance while enhancing the capacity of eMBB service.In [27], a risksensitive framework was presented in order to manage the radio resources in NS.The resource allocation problem was specified as an optimization problem that aims to increase the capacity of the eMBB slice while considering the risk measure function.The aforementioned works do not discuss the impact of data transmission with URLLC requirements over eMBB slots, so we present an in-depth analysis and look to develop a dynamic resource allocation approach to schedule the URLLC and eMBB users effectively.

B. MACHINE LEARNING (ML) FOR RESOURCE MANAGEMENT
The allocation of radio resources in NS can be an issue and it can be resolved by implementing the machine learning (ML) algorithms [9], [31].In wireless communication, supervised learning-based models such as deep neural network (DNN) have been used widely by researchers.DNN can solve complex problems and it helps in finding the optimal solution to an optimization problem.In [32], the authors proposed DNN based algorithm to manage the radio resources, where DNN was used to predict the transmit power policy.The work in [33] proposed a deep learning (DL) based algorithm to optimize the energy efficiency and spectrum efficiency in cognitive radio.The authors presented a convolutional neural network (CNN) based optimization problem in [34], which aims to determine the transmit power while maximizing the energy efficiency and spectrum efficiency with less computation time.Simulation results show that after training the model, the presented approach helps to predict the transmit power taking less computation time compared to other schemes.As it has been presented in the above-mentioned studies [32]- [34], the DNN strategy can be utilized without distinctly finding the solution of the complex optimal control approach of the wireless network.DL strategy can be utilized as an intelligent tool to solve complex optimization problems in resource management, such as resource block allocation, power control, and scheduling.In a real-time environment, DL based resource management approach can determine the network and user state in the wireless network, which helps to manage the radio resources accordingly.This type of intelligent approach is very crucial to meet the URLLC requirements in 5G and beyond wireless systems [35], [36].The label samples can affect the performance of the DL model.It is not quite difficult to get a large number of unlabeled samples in the ML-based approach towards resource management.However, in this scenario, there is a requirement to use more computation to obtain the result of each sample [37].So, there is a need for an algorithm to enhance the learning performance and minimize dependency.In this paper, we present a novel approach based on semi-supervised DRL with a co-training method to solve the resource allocation problem.Labeled samples are taken from the predicted approach and trained, and incorporated with a large number of unlabeled data.Co-training is a semi-supervised learning approach, where two learners are initialized by the learner.It utilizes the estimated labels on the unlabeled data and samples are chosen based on the highest confidence.The wastage of resources can be evaded and the issue of poor generalization can be solved by applying the semi-supervised based approach.
In recent times, many studies are conducted in order to manage the radio resources by using the DRL [15].In [38], the authors presented a framework based on actor-critic reinforcement learning (RL) to optimize power, resource allocation, and joint selection of transmission mode in V2V based device-to-device (D2D) enabled Internet of Vehicle (IoV) networks.It aims to increase the capacity of vehicleto-infrastructure (V2I) nodes.The authors in [39] proposed a DRL-based framework to meet the URLLC requirements subject to power control and rate in the downlink of an OFDMA system.ML has been applied in various RRM tasks, such as spectrum sensing, channel prediction, interference management, and resource allocation.Another promising application of ML in RRM is to predict channel conditions and optimize transmission parameters.In a recent study, authors in [40] proposes a novel approach for resource allocation in RAN using Hierarchical Deep Learning (HDL) to meet the diverse QoS requirements of eMBB and URLLC services.The paper presents a novel approach for resource allocation in RANs using HDL, but it also has some limitations.The HDL model uses a two-level approach, with the first level performing resource allocation for eMBB and the second level optimizing resource allocation for URLLC.The proposed approach aims to maximize network utilization, minimize resource wastage, and ensure that the QoS requirements of both eMBB and URLLC services are met.However, the authors in [40] have not considered the impact of interference from neighboring cells, which may affect the QoS.Further, the proposed approach assumes a centralized resource allocation scheme, which is not be suitable for largescale networks with distributed and dynamic traffic patterns.
The work in [41] presented a DRL-based deep q-learning framework for the co-existence problem of eMBB and URLLC.The existing works do not highlight the larger action space problem (i.e., increased number of possible actions at each time slot) while taking the decision about the allocation of RB.An agent starts exploring meaningless actions (e.g., actions that cannot fulfill the URLLC constraints), which results in a slow convergence rate that affects the performance of the DRL method.We proposed a novel framework incorporating a DRL-based double deep q-learning network (DDQN) with Thompson sampling to enhance the resource allocation capabilities for URLLC traffic in 5G and beyond wireless networks.Authors in [42] present a dynamic RL approach for resource provisioning in virtualized networks, specifically targeting D2D-based communications.The proposed framework adopts a threestage layered structure, wherein the initial stage introduces a dynamic virtual resource allocation scheme based on DRL.In [43], authors propose an approach for autonomously provisioning and customizing resources in virtualized RAN to accommodate mixed traffics.The proposed scheme leverages a DRL algorithm to dynamically allocate resources based on the specific requirements of different traffic types.Existing studies have used the e-greedy approach as the explorationexploitation strategy in DRL-based approaches to address the resource allocation problem in network slicing.Using Thompson Sampling in the DRL approach can potentially provide advantages over the greedy method [44].Greedy methods typically focus on exploitation by always selecting the action with the highest estimated reward.While this can be effective in some cases, it may lead to sub-optimal solutions or being stuck in local optima.Whereas Thompson sampling uses a probabilistic approach where each action's selection is influenced by a distribution.This distribution allows for exploration by occasionally selecting sub-optimal actions to gather more information about their potential rewards.By exploring different options, Thompson Sampling can potentially discover better resource allocation strategies that may not be immediately apparent through a purely greedy approach.The network conditions, user demands, and traffic patterns may vary over time.Using Thompson Sampling enables the DRL agent to continually learn and adapt to these changing conditions.It can adjust its resource allocation decisions based on the most recent information, leading to improved performance and responsiveness.Efficient exploration can lead to quicker identification of optimal or near-optimal allocation strategies.

III. SYSTEM MODEL
We consider the downlink transmission scenario of heterogeneous network.In the considered scenario, the coverage area of a macro cell is populated with a random distribution of multiple small cells, and set of all BS is denoted by B = {1, 2, ..., b, ..., |B|}.We focus on two kinds of downlink requests, eMBB and URLLC.As shown in Fig. 1, there are different kinds of UEs such as AR/VR, smart transportation, and smartphones scattered randomly and connected to each BS.In this model, several edge servers are placed at the edge of a network, and these edge servers are linked to a larger centralized cloud server.The set of eMBB and URLLC users present in the network can be denoted as W e b = {1, ..., W e b }, and W u b = {1, ..., W u b }, respectively.The available radio resources in 5G-NR can be presented in frequency and time domain, whereas frequency and time domain is divided into a number of N radio resources or RB, where each RB has a bandwidth defined as B in the frequency domain.The set of RBs can be defined as N = {1, ..., n, .., N }.In the time domain, every TTI has a duration of 1 ms, so in one time slot, there are total N number of available RBs.Each RB Total number of URLLC data that arrived at TTI t Q π (st, at) Cumulative discounted reward of policy π Cova t Co-variance for every action at has 7 symbols and consists of 12 sub-carriers, so there are 84 resource elements (RE) in a single RB.The available time slot is further split into K smaller units called short TTI or mini-slots.Generally, to enhance the SE, eMBB service spans multiple TTIs.Due to stringent latency requirements, the incoming URLLC traffic cannot be held back during the ongoing eMBB communication service.So, we puncture the eMBB slots and transmits the URLLC traffic.In this regard, we schedule URLLC service at short TTI (duration of 0.5 ms), and eMBB service with the duration of 1 ms.Because of this, the instantaneous scheduling of URLLC transmission, which involves interrupting eMBB traffic, can have a significant impact on both the system's capacity and reliability, leading to a degradation in the performance of the eMBB service.So, a proper framework is required to meet the QoS requirements.

A. EMBB THROUGHPUT
Transmitting the URLLC traffic over the punctured eMBB slots can affect the bit rate of eMBB services.We introduce a decision variable for the purpose of puncturing as stated below.
1, if the k th mini-slot is punctured by the w th URLLC user, ∀ n ∈ N , 0, otherwise. ( There's a problem in allocating the total number of radio resources to the users, because each RB needs to be assigned to active user.We assume that one RB of each BS is occupied by a single user.Mathematically RBA strategy can be (2) The signal-to-noise-and-interference-ratio (SINR) of the eMBB user w can be computed as: where p e,w b,n (t), and g e,w b,n (t) represents the transmitted power and channel gain, respectively, of eMBB user w of BS b over RB n, and σ 2 is the noise power.The throughput of an eMBB user w of BS b on RB n at time slot t can be approximated as: where the term represents the loss of eMBB rate due to puncturing.Thus, the total sum rate achieved by the eMBB user w can be expressed as:

B. URLLC THROUGHPUT
To avoid transmission delay, the blocklength in URLLC should be finite.Whereas, Shannon's capacity theorem is applicable when blocklength is infinite.In [23], the authors have analyzed the resource management problem for URLLC service given achievable data rate in the finite blocklength regime.Thus, the work in [45] describes the achievable data rate in URLLC for finite blocklength as follows, where ζ u,w b,n (t) refers to the SINR of URLLC user, expressed as eMBB interference +σ 2 .
(7) Here, Y u,w b,n indicates the dispersion of the channel, and determines the channel randomness of user, and can be represented as: The number of symbols in each mini-slot is represented by v u,w b,n (t), and Q −1 (x) represents the Gaussian inverse CDF Q-function, where x indicates the error rate.

IV. PROBLEM FORMULATION
In this paper, we aim to maximise the sum rate of the eMBB user, while fulfilling the URLLC constraints.The resource allocation problem is formulated as an optimization-based problem.In the beginning, we assign transmission power and RBs to eMBB UEs at each TTI.We assume that total power is equal over all sub-carriers.Then, we puncture the eMBB slots and transmit the URLLC traffic over them.Puncturing can affect the capacity and reliability of the system.So, we propose a new approach to maximize the eMBB rate subject to URLLC constraints while minimizing the effect on eMBB reliability.For URLLC users, we suppose that the users create small packets fragments, and the packet arrival rate at mini-slot k ∈ K = {1, ..., k, .., K} at TTI t follows a Poisson point process (PPP) distribution.We denote the number of arrived small packets with a random variable ψ k (t) such that ψ(t) indicates the total number of URLLC packets that arrived at TTI t.Thus, the reliability of URLLC service can be obtained by the following equation: where (11a) aims to maximize the eMBB rate.Constraint (11b) indicates the RB allocation limitations, and it ensures that only a single user should be associated with a RB.Whereas, (11c) represents the puncturing constraint.Constraint (11d) guarantees the URLLC reliability.The key objective is to execute dynamic allocation of the resources in order to increase the capacity (in terms of sum rate) of the eMBB users subject to different constraints.It can be observed in (11) that optimization problem P is a NP-hard non-convex problem, and it is challenging to find the optimal solution in general.There is a requirement for an intelligent approach for solving this optimization problem.The resource allocation approaches for URLLC and eMBB services are different.URLLC services need to meet the low-latency requirements and also prioritized access to the network, while eMBB services require high data-rate and optimized network utilization.These differences in resource allocation strategies make it difficult to optimize both services simultaneously, and decomposing the optimization problem can help to optimize each service's resource allocation separately.To find the optimal solution to the resource allocation optimization problem, we break the problem P into two sub-problems, P1: RB allocation for eMBB slice, and P2: URLLC scheduling.

V. RB ALLOCATION STRATEGY FOR EMBB SLICE
The RB allocation problem can be expressed as: where Â indicates the forecasted RB allocation strategy.
In Algorithm 1, we have presented the two-sided matching approach in order to produce the initial RB allocation strategy.RBs and different users associated with different slices are considered as two contestants seeking the maximization of their specific objective function.Co-training is a semisupervised learning method where two models are trained by utilizing a large number of unlabeled data.From Algorithm 1, we have generated the labeled data which consists of gain values g u,w b,n (t) and RB allocation strategy a b w,n (t).Algorithm 1 based on two-sided matching technique serve as an initial RB allocation mechanism for the CDRL approach for eMBB RB allocation.The initial RB allocation based on the twosided matching technique provides a foundation for further optimization by providing an initial allocation of RBs based on user preferences, which can then be refined and optimized using the CDRL approach.The CDRL algorithm can learn from the initial RB allocation and user feedback to improve the RB allocation policy over time.By continuously interacting with the environment and optimizing the allocation based on the learned policy, the CDRL approach can improve the RB allocation efficiency and adapt to changing network conditions.The parameters Ω b w,n and Ω b w indicates the users assigned to RB (n) and the users of unallocated RBs, respectively.It can be presented in matrix form as follows: whereas, unlabeled data can be presented as: where l is the number of data samples.Our main objective is to predict the label value A from unlabeled data G.The existing co-training method is based on a policy of choosing the samples which have high-confidence values.In this paper, we have used DRL based q-learning approach to improve the policy by choosing the unlabeled samples after taking the action a t at each TTI.First, we decompose the unlabeled samples into various sub-samples according to their similar traffic behavior.The DRL agent employs a policy to chose one sub-sample at each TTI t instead of selecting one sample, which can enhance the computational efficiency and reduce the latency, and then the two learners are updated.The decomposition of unlabeled samples can be presented as follows: Initialize Ω b w,n as users assigned to RB(n) 4: Specify Ω b w for users of unallocated RBs 5: while for users do if Ω b w,n = 2 then 13: The utility function of the two users assigned to RB(n) needs to be calculated end while 19: end for where j is the number of data samples.First, the two learners are trained with a small amount of labeled data d l at the start of training.At each TTI, the DRL agent takes a decision (action), and then the unlabeled sub-samples are chosen to train the learners.The backbone of our proposed model is the q-learning approach, where best quality unlabeled subsamples are chosen for co-training by the agent after understanding the optimal policy through training.The statespace s t is observed by the agent at each TTI t and takes the best possible action a t , and then the two learners Z 1 and Z 2 are updated with ℧ u .Our objective is to train the learner Z, which can accurately predict the RB allocation such that Z : G → A Let's assume that Z(g/θ) indicates the Gaussian distribution, then the distribution can be presented as [37]: There is a parallel vector latent variable against each data sample.This variable is determined by the mixture coefficient δ i , and corresponding component h can be obtained by using this coefficient.The probability of h and g can be given as P (v i )|g i , h i .The optimal classification can be formulated as:  where; From the above equation, it can be observed that training samples can be used to predict Z(g).

State Space:
The agent should be well familiar with the distribution of the unlabeled sample in order to choose the best sub-samples.We examine the probability distribution of two learners and it can be formulated as: where β and γ represents the probability distribution of two learners Z 1 and Z 2 respectively, and Cat indicates the concatenation operation.Whereas, f (g; s t , h, θ) represents a nonlinear likelihood function.

Action Space
The q-learning agent chooses the best possible action by choosing the best quality unlabeled sub-samples at TTI t after learning the optimal policy such that

Reward
The reward of each learner can be formulated as: where r 1 and r 2 represent the model accuracy of learners Z 1 and Z 2 , respectively determined on the labeled testing data samples at TTI t.The agent aims to take a decision or Algorithm 2 CDRL approach for eMBB RB allocation Take action a t = max a Q(s t , a); Update parameter θ; 13: Compute loss function using (32); 14: end for 15: end for action a t at each TTI t which can increase the future discount reward.
where Λ refers to the discount factor.The main focus is to maximize the reward R t by finding an optimal policy.The q-agent in the q-learning network will learn the optimal policy by interacting with the two learners which act as the environment.The loss function can be presented as: where, The above equation indicates that θ learn the optimal policy by using the gradient descent method.During testing, the two learners Z 1 and Z 2 , and the q-learning agent were simulated together without the labeled validation samples.The agent learns the optimal policy and takes action a t and chooses the unlabeled sub-samples.Finally, learner Z can be defined as: where φ indicates the weight factor based on learning policy.The detail is provided in Algorithm 2. A depiction of CDRL framework is presented in Fig. 2.

VI. URLLC SCHEDULING
Due to the heterogeneity of URLLC traffic, it has become essential to intelligently and dynamically assign the radio resources to the incoming URLLC traffic.Thus, we present a DRL-based URLLC scheduling approach to manage the radio resources for the incoming URLLC traffic.We can state the URLLC scheduling problem as follows: The URLLC scheduling obtained by the CDRL algorithm can not fulfill the low latency and reliability reliability constraint due to the slow convergence of DRL-based qnetwork.So, we use CDRL approach for eMBB resource allocation and propose a novel approach based on Thompson sampling for URLLC scheduling.We ensure that the constraints (27b-27d) meets the requirements while actively engaging with the environment.In this algorithm, we present a DDQN based approach to meet the latency and reliability requirements and to intelligently manage the URLLC traffic over the punctured eMBB slots.A RL model is described by action, state, and reward.

State-space
We define the state space s t by describing the throughput of each user of BS b associated with eMBB service without puncturing depending on the channel gain, allocated RBs, incoming URLLC traffic, and transmission power.So, the throughput of each user associated with eMBB service without puncturing can be presented as: Thus, the state space s t can be defined as: where ψ(t) is defined in (9) and g(t) is channel gain.

Action space
The action space a t can be described as the N ×K puncturing matrix which indicates the K number of mini-slots within each RB that have been punctured.

Reward
Considering the QoS requirements of different slices and associated applications, we present a reward function which can be given as: where ϑ(t) indicates the time varying weight coefficients of part II.We introduce this coefficient to ensure the URLLC reliability constraint.The following equation can be used to describe it: where η(t) represents the achieved outage probability as stated in (10).Part I represents the eMBB rate we want to maximize, whereas part II indicates the URLLC constraint.The agent aims to select an optimal policy π in order to increase the reward, which means with the lowest outage probability and the highest sum rate are achieved.The policy π = π K a can be defined as the given network state s t observed by the agent and the agent takes action a t on the number of punctured mini-slots K from each allocated RB a. Then by using (30), the reward is calculated by the agent based on decisions taken, and new state information of the network is given to the agent.Let us assume that Q π (s t , a t ) indicates the q-function, the cumulative discounted reward for the given network state with a policy π can be formulated as: where Λ(t) and s 0 represents the discount factor and initial state, respectively.The above function only takes the current reward into account.According to [46], it can be rewritten as: A DNN is used for the approximation of the above function.
The main objective of the earlier mentioned approach is to find the optimal policy π which can increase the reward.The optimal policy π can be expressed as follows: To optimize the policy π in (34), different RL techniques can be employed such as policy gradient and q-learning.Therefore, the work in [47] shows that the q-learning technique converges slowly and it is hard for it to solve the optimal policy.Whereas, policy gradient method results in high variance and converges to a local optimum.Thus, we propose the DDQN method with Thompson sampling to learn the policy which results in a faster convergence rate.

A. DDQN WITH THOMPSON SAMPLING
We present the Thompson sampling method with DDQN in order to improve the convergence rate and balance the exploitation and exploration.The Thompson sampling is based on probability-based exploration, where the agent takes an action randomly depending on the best probability.
Thompson sampling is a very effective and efficient method in the context of exploitation and exploration, because the agent never selects the actions with less probability, and avoids consuming time on meaningless explorations which result in a faster convergence rate [48].Therefore, combining the DDQN with Thompson sampling results in reliable and effective resource management for URLLC traffic.It helps in handling large state spaces as it avoids exhaustive exploration of the entire space.Only actions with higher probabilities of being optimal are more likely to be selected.By repeatedly sampling and selecting actions based on the estimated probabilities, the algorithm gradually learns which actions are more likely to yield better results.In our previous work [49], we employed Thompson sampling to enhance network efficiency and fulfill the stringent URLLC requirements within a resource-constrained and highly dynamic V2X (Vehicleto-Everything) environment.DDQN method was proposed by Hasselt [50] to solve the overestimation problem in qlearning.There are two different DNN utilized by the DDQN: 1) deep q network (DQN), and 2) target network.It can be mathematically expressed as follows: where, â = max Further, Qπ (s t+1 , â) refers to the target network where DQN chooses the maximum Q-value by taking the best action a of the next state.Then the target network Q calculates the approximated Q-value by taking action â.The Q-value of DQN is updated based on the approximation from the target network Q.Then the parameters of the Q are updated based on the DQN parameters.The architecture of DDQN comprises a DNN where the Q-value indicates a linear function.Thus, for any network state s t and action a t , it can be expressed as follows: where ω at and ϕ θ (s t ) denote the weight of the last layer and linearity of the output layer parameterized by θ, respectively.Similarly, the output layer and weight of the target network can also be represented by the ϕ θ (.) and ωat , respectively.Further, ( 35) and ( 36) can be rewritten as: where, ât = argmax a ϕ T θ ω at (39) The loss function can be computed as: We employed Gaussian Bayesian linear regression in order to approximate the posterior on the weight of the last layer and the Q-network function.In this paper, we estimate the distribution by using Gaussian Bayesian linear regression over the Q-values and formulate an effective and balance explorationexploitation scheme by utilizing Thompson sampling.The posterior distribution is estimated as: w at ∼ M(w at , Cov at ) where w at and ℘ indicate the mean and variance of likelihood, respectively.Through (42) the agent employs Thompson sampling to sample w at around mean w at and covariance Cov for every decision a t .DDQN agent keeps the prior and at the beginning of each TTI updates the posterior and extracts weight of the last layer and follows the optimal policy π.The training details are given in Algorithm 3. Fig. 3 shows the block diagram of the proposed framework.Initially, the BS assigns RBs to eMBB service users according to the optimal policy obtained by the CDRL approach.Then it sends the state space to Algorithm 3. The experience replay buffer of the proposed Algorithm 3 is initialized based on the results obtained by the CDRL approach.This information can serve as input to the URLLC RB allocation decision-making process.Then, Algorithm 3 which is based on Thompson sampling chooses an action based on its observed environment, and perceive the immediate reward r t and next state s(t + 1), and accumulates the state space, action, reward and next state in the experience replay buffer.The transfer of states between the eMBB and URLLC components facilitates a collaborative learning process.It allows the components to leverage relevant information from each other to improve the overall RB allocation performance and ensure the specific requirements of both eMBB and URLLC users are considered.Finally, the weight coefficient value ϑ(t) is updated.

VII. PERFORMANCE ANALYSIS
We show the performance of our proposed algorithms in this section through inclusive empirical analysis for different parameters.The network dynamics are modeled by considering factors which includes channel conditions, interference levels, traffic variations, and resource utilization.
The model can simulate the evolution of these factors over time, enabling the DRL agent to observe and learn from the  Observe the network state s t = re b,w (t), g(t), ψ( Samples a Q-function 7: if t mod posterior update period=0 then if t mod posterior sampling period=0 then 11: Extract samples using (42) 12: end if

13:
Set θ ← θ after every target update Update parameter θ by minimizing a loss function 21: end for network dynamics during the training process.The dynamics of the network model are incorporated into the DRL training by allowing the DRL agent to observe the current state of the network, take actions, and observe the resulting state transitions and rewards.By interacting with the model, the DRL agent can learn to make optimal resource allocation decisions in response to changes in the network dynamics.The Thompson sampling algorithm can adaptively explore, and exploit actions based on their estimated probabilities of being optimal.This allows the algorithm to dynamically adjust its scheduling decisions in response to changes in network conditions and requirements.In this work, we evaluate our results by comparing them with different approaches such as PGACL [11]: a risk-averse based approach to increase the reliability, Q-learning, DQN, optimal approach, and random search.PGACL achieves policy learning with a rapid convergence rate by integrating policy and value learning.The algorithm leverages the gradient method.PGACL is made up of the actor and the critic.The actor component is responsible for policy control based on the current state of the network, determining the actions to be taken.On the other hand, the critic component evaluates the effectiveness of the chosen policy by utilizing the reward function, providing feedback on the quality of the selected actions.

A. SIMULATION FRAMEWORK
The eMBB and URLLC services are utilized by a diverse set of users randomly scattered across a 3-cell cluster in a 4km area and control packets are sent between network nodes and devices.The duration of 1ms is assigned to TTI, and further, each TTI is decomposed into seven orthogonal mini-  2.

B. PERFORMANCE EVALUATION OF CDRL ALGORITHM
In this section, we analyze the performance of the CDRL algorithm for RB allocation and compare the results with single DNN [40] and semi-supervised learning (SSL) with DL.In Fig. 4, the eMBB sum rate obtained by semi-supervised learning with DL and single DNN for RBA has been shown.It can be seen that system performed differently for different schemes.CDRL result performs better than the other schemes, value ranging from 20 Mbit/sec to 60 Mbit/sec.It is evident that the proposed CDRL algorithm performs better than DL schemes.This shows that co-training with DRL can solve the problem of RB allocation for eMBB service users in NS.

C. RELIABILITY EVALUATION OF URLLC
First, we analyze the worst URLLC reliability scenario achieved by Algorithm 3 based on DDQN with Thompson sampling and compare the performance with Q-learning and PGACL.URLLC reliability analysis is shown in Fig. 5 by plotting the CCDF.It can be observed from the CCDF plot that DDQN with Thompson sampling reduces the tail-risk of URLLC outage probability.The proposed Algorithm 3 guarantees that its values do not violate the threshold η, whereas Algorithm 2 violate the reliability threshold.Our proposed method adjusts the weight parameters according to the behavior of URLLC traffic.This helps to achieve reliable URLLC transmission.Thus, Algorithm 2 fail to guarantee strict URLLC reliability requirements due to their inability to adjust according to channel variations.The Q-learning based method converges slowly and it is hard for it to solve the  optimal policy for stringent URLLC service, which results in poor performance.It can be observed from Fig. 5, that the outage probability achieved by the Algorithm 2 performs poorly when the threshold value is 0.037 with a violation probability value around 0.13.

D. CONVERGENCE ANALYSIS
Next, we analyze the convergence behavior of the proposed approach and compare it with the centralized method, where every user has complete awareness of the environment.In this case, the agent takes the decision selection of all agents, increasing the dimension of the action space which effects the convergence rate.In Fig. 6, we plot the convergence reward value over a number of episodes.It can be seen that the centralized method experiences a poor convergence performance initially and then converges after some episodes.However, the proposed approach based on DDQN coupled with Thompson sampling performs better than centralized approach, at the beginning it converges fast and achieves a better reward value.So, our proposed method performs better in a heterogeneous environment and finds an optimal policy with a fast convergence rate.

E. EMBB RELIABILITY ANALYSIS
Due to incoming URLLC traffic, it is necessary to analyze the reliability of the eMBB service.The reliability of eMBB is determined by calculating the number of eMBB users who achieve a data rate higher than a specific target rate (R min ) and dividing it by the total number of eMBB users.This helps us determine the percentage of eMBB users who experience satisfactory service levels in a particular scenario characterized by specific channel conditions and URLLC traffic.
It can be seen in Fig. 7 that the PGACL based risk-averse formulation and proposed method achieves higher reliability.The PGACL based risk-averse formulation performs better than other schemes because the variance of eMBB users punctures only those users with higher SNR, which results in better reliability of eMBB service.However, our proposed algorithm achieves comparable eMBB reliability with PGACL and a much higher sum-rate, because the URLLC service is scheduled over eMBB time slots given the cost function to increase the sum rate of the system while ensuring the QoS requirements of users associated with the eMBB service.The proposed approach ensures the eMBB's reliability by efficiently finding the optimal policy of radio resource management.Furthermore, it can also be noticed that as the target data rate R min increases, the eMBB reliability decreases with it.The proposed algorithm and PGACL based risk-averse formulation keep the higher reliability at almost 90% when the target data rate is 15 Mbps, while Q-learning fails to keep a tolerable eMBB reliability.Furthermore, when the target data rate is increased to R min = 30 Mbps the reliability achieved by the proposed algorithm and PGACL based risk-averse approach is near 80%, while the other schemes fail to achieve the tolerable reliability.It is because the agent in our proposed approach allocate the RBs to the users which has higher SINR and meet the objective function.It can also be seen that as the number of URLLC users increases, the reliability of the eMBB service decreases, because more eMBB slots needs to be punctured which effects the eMBB reliability.

F. EMBB RATE PERFORMANCE
We study the effect of puncturing on eMBB data rate, and compare the results with other methods for various loads of incoming URLLC traffic by plotting the average data rate of the eMBB service.In Fig. 8, it can be seen that the incoming URLLC traffic affects the data rate of the eMBB service because the users associated with the URLLC service are given priority and more radio resources are assigned to URLLC users in order to meet the stringent latency requirements of URLLC service.Furthermore, as compared to other methods the proposed algorithm achieves a higher average data rate for eMBB users up to 48 Mbps when URLLC load is 45.The random search policy performs poorly because it randomly finds the optimal policy, and is based on a simple architecture.The PGACL based risk-averse approach achieves less average data rate than DQN and the proposed algorithm.The proposed approach achieves a higher average data rate at the beginning of the arrival of URLLC traffic and starts decreasing when the arrival rate of URLLC traffic is increased, hence keeping the higher average data rate than other methods.
Table 3. provides a summary of the ML-based methods employed in to address the resource management problem.The table highlights the convergence behaviour, variance, QoS, communication overhead and data requirements of each approach.It also acknowledges the scalability in large networks.The other approaches, such as Q-learning, DQN, PGAC, and DNN, are compared based on the above mentioned features to the field of reinforcement learning and resource management.

VIII. CONCLUSION AND FUTURE WORK
In this work, we have analyzed the issues related to the coexistence of eMBB and URLLC services in 5G and beyond networks.Using the puncturing technique, we proposed an efficient framework to ensure the capacity and reliability of the system while meeting the low-latency requirements.Moreover, we have employed ML-based algorithms such as semi-supervised and DRL methods to solve the complex optimization problems in real-time in order to allocate the resources intelligently.A co-training method of semi-supervised learning is used in the RB allocation strategy phase.We have addressed the URLLC scheduling sub-problem by proposing a DRL-based DDQN approach with Thompson sampling to meet the latency and reliability requirements and to intelligently manage the URLLC traffic over the punctured eMBB slots.The simulation results verified that the algorithms proposed in this study aim to fulfill the reliability requirements of URLLC users while simultaneously ensuring the reliability and achieving a higher average sum rate for eMBB users.Training the CDRL model can be computationally intensive and time-consuming.In particular, the convergence time of the algorithm may be lengthy, especially in complex network scenarios.Hence, balancing the need for accurate optimization and real-time decision-making can pose a challenge.Furthermore, the integration of intelligent resource management algorithms may introduce additional communication overhead to the network.This could be due to the exchange of information between network elements, coordination mechanisms, or feedback loops, potentially affecting overall network performance and efficiency.In the future, we look to explore the applications of advanced ML to address these challenges and limitations.
This article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/TABLE 1: List of Notations Notation Definition B Set of BS W e b , W u b Set of eMBB and URLLC users, respectively K Total number of mini-slots N Total number of RBs ξ b,w n,k (t) Puncturing decision variable B Initial assigned bandwidth to eMBB users a b w,n RB allocation strategy ζ e,w b,n (t) Signal-to-Interference-Noise-Ratio (SINR) of eMBB users ζ u,w b,n (t) SINR of URLLC users p e,w b,n eMBB Transmitted power p u,w b,n URLLC Transmitted power g e,w b,n eMBB Channel gain g u,w b,n URLLC Channel gain r e b,n (t) Sum rate of eMBB users r u,w b,n (t) Sum rate of URLLC users Y u,w b,n Dispersion of the channel ψ(t)

FIGURE 1 1 ,
FIGURE 1: System Model )where κ refers to the packet size of URLLC service.The above equation indicates the outage probability should not exceed the threshold value η.So, the optimization problem of joint resource allocation of eMBB and URLLC can be mathematically formulated as follows:

P 1 7
(t) ≤ 1, ∀n ∈ N , b ∈ B (12b) a b w,n (t) ∈ {0, 1}, ∀w ∈ W e , n ∈ N (12c) We propose a novel CDRL approach, where we use DRL with a semi-supervised based co-training method to predict VOLUME 4, 2016 This article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/ the resource block for each user associated with eMBB slice.First, we modify the P1 in (12) into a loss function and then achieve the optimal solution of the RB allocation by minimizing the loss function such that: min Â ∥ Â − arg max r e b,w ∥ 2 (t) ≤ 1, ∀n ∈ N , b ∈ B a b w,n (t) ∈ {0, 1}, ∀w ∈ W e , n ∈ N

Algorithm 1
Initial RB allocation strategy based on Twosided matching method 1: RB allocation A is initialized 2: for a BS b from the set of BS B do 3:

7 :
Select the RB (n) with the highest signal-tointerference-noise-ratio (SINR) based on channel quality indicator (CQI)

8 VOLUME 4
, 2016 This article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/

1 :
Input: Labeled samples of RB allocation d l , labeled validation samples d ′ l , unlabeled sub-samples ℧ u ; 2: for t = 1 to T do 3: for j = 1 to 2 do 4:Train Z t,j with labeled samples d l ; 5:

6 :Use Z 1 7 :Upgrade Z 2 8 :Z 2 9 :Upgrade Z 1
to label the sub-samples ℧ ′ u ; with pseudo-labeled sub-samples ℧ ′ u , and labeled samples d l ; is used to label the sub-samples ℧ ′ u ; with pseudo-labeled sub-samples ℧ ′ u , and labeled samples d l ; 10: Determine the reward r t based on validation labeled samples d ′ l ;

) VOLUME 4 , 2016 9
This article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/

FIGURE 3 :
FIGURE 3: Block diagram of the proposed framework.

8 :
Update mean w at and co-variance Cov a of poste-

FIGURE 4 :
FIGURE 4: CDF of the eMBB sum rate obtained by different schemes.

VOLUME 4, 2016 13 This
article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/(a) Rmin = 15M bps (b) Rmin = 30M bps
This article has been accepted for publication in IEEE Access.This is the author's version which has not been fully edited and content may change prior to final publication.Citation information: DOI 10.1109/ACCESS.2023.3288698This work is licensed under a Creative Commons Attribution 4.0 License.For more information, see https://creativecommons.org/licenses/by/4.0/