Device and Network Coordination for Opportunistic Utilization of Radio Resources in 3D Networks

Device and network coordination is critical for efficient radio resource (RR) utilization while meeting Quality of Service (QoS) requirements in heavily congested future heterogeneous wireless networks featured with 3-Dimensional (3D) small cells (SCs). Device and network coordination assisted opportunistic and coordinated use of RRs in distinct bands could dramatically improve the spectrum utilization in these networks. In this study, overall communication performance enhancement through better utilization of opportunistically available spatially distributed RRs in a 3D SC is addressed considering two co-located networks operated in licensed band (LB) and unlicensed band (UB) while jointly accounting for several related factors like 3D spatial positions and QoS requirements of the devices. To confront this problem, a device and network coordination assisted solution is developed using Q-learning and Slotted-ALOHA principles. Then, to maintain performance standards, device and network coordination aided scheduling, power control and access prioritization schemes are discussed. Subsequently, regret based learning assisted algorithm is presented for the UB to optimally utilize RRs. In these solutions, both device-network and network-network interactions are considered. In results, approximately 75% better overall coordination efficiency over conventional methods is shown at the initial iterations for the scenarios with the highest device density demonstrating attractive performance.


I. INTRODUCTION
With the emergence of Internet of Everything (IoE) and ongoing industry transformation, it is expected that billions of machines and sensors will be connected to Internet through wireless networks, leading to a radical paradigm shift from rate-centric enhanced mobile broad-band services of yesteryears to an information-centric, ultra-reliable, low latent, intelligent, communication and computing services [1], [2]. In contrast to physical objects in the definition of Internet of Things (IoT), encompassing four components namely things, processes, data and people are honored as the key contributors in the definition of IoE in upcoming communication systems [3], [4]. On this path towards a smart world, even the number of conventional IoT devices are expected to grow by hundreds of billions by 2030 [2] exerting an enormous pressure on the wireless infrastructure facilities to cope with dense wireless small cells (SCs) using scarce radio resources (RRs) in an efficient manner than ever before. Due to massive number of devices, their priority and Quality of Service (QoS) requirements, design and operation of wireless networks have become increasingly challenging tasks. In emerging complex heterogeneous networks (HetNets) environment, 3-Dimensional (3D) network rollout aspects [5] together with device and network coordination schemes are to be thoroughly studied in better utilization of critically limited RRs.
With the increase of network densification and service diversification, there is a significant hike in the number of devices seeking network access through always scarce RRs in a densely dispersed 3D space. In addition, due to limitations of the RRs in the licensed band (LB), unlicensed band (UB) is also recognized as a source of opportunistically accessible RRs for the cellular networks [6] while sharing it with the other non-cellular network users [7]. Furthermore, these spatially distributed location-specific RRs [8] are to be utilized more accurately and even better than in 2-Dimensional (2D) networks avoiding the assumption used to develop techniques for the 2D networks that the RRs are constant over the third spatial dimension. On the other hand, devices are with disparate operational conditions and QoS needs while intensely competing for the RRs and network access. In this case, it is necessary to have more informative device and network coordination assisted mechanisms for systematic exploration of third spatial dimension for service effective coverage planning, RR management, reduction of access delays and meeting the QoS aspects of the devices in an efficient and effective manner in the future HetNets.

A. RELATED WORK
One of the key initiatives in device and application prioritization mechanisms is to introduce a distributed scheduling and multiple access mechanism with a basic prioritization option for the devices [9]. However, most of these schemes are not supported with at least the simplest device and network coordination schemes like error feedback algorithms [10]. Moreover, these primary prioritization and feedback schemes have not identified the potential contributions that they could have received from the transmit power control mechanisms [11]. In this case, even though these prioritization schemes have evolved up to the 5th-Generation (5G) new radio (NR) networks [12], no connection can be seen between device prioritization and the initial transmit power of the devices.
One of the historical milestones on device and network coordination for initial RR allocation for wireless channels is marked with the introduction of machine learning (ML) assisted S-ALOHA schemes to solve random access channel (RACH) congestion problem in 2012 [13] where the coordination is established with down-link (DL) coordination farms. Subsequently, this solution is further improved with Q-learning (QL) leading to sub-optimum solutions in [14], [15]. RACH access in up-link (UL) is the first occasion where RRs become opportunistic for the devices seeking communication in the LB. Since, future 3D networks [5] are going to be service and application oriented, information centric, well-coordinated intelligent entities, it is necessary to consider device prioritization, QoS and 3D location-specific RR information within the initial device and network coordination and RR utilization processes. However, in some studies, 3D spatial positions of the devices are considered in certain problems like UL power control for unmanned aerial vehicles (UAVs) aided communications [16].
Traffic offloading to the UB through opportunistic utilization of RRs [17], [18], [19] using NR-Unlicensed (NR-U) [17] and LTE-Unlicensed (LTE-U) [20] concepts is a very common solution for the problem of scarcity of radio spectrum in the LBs rather than employing spectrum sharing and cognitive radio (CR) concepts [21]. In [18], [19], random access and coordinated scheduling schemes for UL of a Long-Term Evolution (LTE) license assisted access system are compared without accounting for channel access delays. Total throughput of both DL and UL NR-U system is maximized while ensuring fair coexistence with a WiFi network in [17]. Further, in [22] listen before talk options for resource utilization in NR-U is discussed in the presence of different wireless networks. In [23], solutions are given to mitigate the aliasing effects caused by pseudo paths in 5G NR and NR-U networks.
The main difference between the solutions presented in this study and RR utilization in CR [24], [25] is that the proposed opportunistic RR utilization algorithms focus on how to use the 3D spatially distributed RRs rather than to decide whether a particular frequency band can be used or not. Furthermore, availability of a frequency band is opportunistic in CR [24], [25] and channel access is opportunistic in the solution discussed in this study. In this investigation, attention is paid to 3D spatial domain utilization of radio spectrum. In many of the solutions presented with regret based learning (RL) [26] and other techniques [6], [7], [20], [27] for dynamic RR management in LTE-U in 2D networks are highly inefficient as they don't consider 3D spatial distribution of location-specific RRs while avoiding device access prioritization needs within UL channel access problem in utilization of RRs.
When considering a fully functional 3D network, it consists of at least four basic technical requirements: 1. utilization of 3D coordinate system to identify locations of the devices, 2. utilization of location-specific spatially distributed 3D radio resources, 3. following the 3D planning concepts (including frequency and coverage planning) to serve the devices [5] and 4. employment of techniques where 3D characteristics related to the device distributions, the radio resources and/or environment are leveraged. In addition, efficient use of 3D spatial parameters should be reflected through some of the performance metrics as well. Any network that uses fewer elements than these key properties is not identified as a fully functional and fully featured 3D network. Based on this definition, in the literature, almost no work meets the all these conditions except in few exceptions [8]. All the concepts, algorithms, theories and techniques those who utilize one or more of these main components are recognized as the supportive concepts, algorithms, theories and techniques for the 3D networks. Some of the different use cases of the term 3D in relation to cellular wireless communication are given in Table I.

B. MOTIVATIONS AND CHALLENGES
There are several challenges to overcome in achieving highly efficient communication in 3D SC HetNets. These challenges include, coordinated use of location-specific opportunistic RRs in a precise and efficient manner, management of channel access [45], [46] while minimizing access collisions and relevant delays (i.e., increase of access probability through mitigation of collisions), and avoidance of access attempts for already allocated RRs. Due to addition of uncoordinated massive number of heterogeneous devices these challenges are still in the forefront of future wireless network designs. The situation becomes even worse as machine type devices can continuously compete for limited RRs in a small 3D space even risking connections for the most needed devices.
Other than the timely demand on 3D SC networks, this work is highly motivated by a number of other reasons as well. First, the need of a mechanism to account for different priority requirements of the data types, services and devices in future 3D SC HetNets by recognizing their 3D location-specific properties and RRs. There should be enough provisions to represent numerous types of physical parameters (e.g., altitude of a device), QoS needs and priority requirements (e.g., priority for mission critical data) in physical or MAC layer problem formulation enabling them to be accounted for subsequent solutions. Secondly, the necessity of having a more efficient and fast channel access mechanisms leading to low latent and reliable networks. Prominent delays in the networks are in the areas of grant acquisition (5 ms), random access (9.5 ms), transmit time interval (1 ms), signal processing (3 ms), packet retransmission (8 ms) and core network/Internet (vary vastly) [47], [48] where random access mechanisms are with significantly high delays. Furthermore, most of the algorithms in cellular networks are executed at the base stations (BSs) resulting not only large waiting times for RACH access and RR utilization [18], [19] but also heavy UL access interference as well [49], [50]. These delays can be reduced with the use of intelligent, proactive, independent and coordinated devices. Thirdly, to support RR and interference management by utilizing opportunistically available RRs in different networks operated in distinct frequency bands. Forth factor is to promote environment and situation aware, self-configurable technical background for future intelligent HetNets consisting of proactive devices.
In the past, there had been less device density in the 3D space due to the small number of devices and low flying objects [37], [51] leading to have comparatively low demand for the RRs. Under that condition, there was no significant scarcity of RRs where many of the challenges and motivations discussed in this study were not in the forefront of designing cellular wireless networks. With the increase of device density and the demand for high throughput rates, it was compelled to use high frequency bands with reduced communication distances for the cellular networks leading to have dense SCs. In this environment, RRs are going to be extremely scarce and they are to be very carefully used. However, even under this situation, with 2D network design and operation principles, in most of the occasions approximate site specific path loss parameters are used for allocation and utilization of RRs. Then, in a highly dense environment and with these approximate parameters, the errors and performance degradation caused due to inaccurate and inefficient use of RRs are not going to be negligible any more. As a solution for this problem, third spatial dimension is suggested to be used for cellular network planning and RR management together with spatially distributed location-specific path loss parameters leading to have better performing 3D SC HetNets through efficient and much accurate use of RRs.
This study is developed based on the overall objective of efficient utilization of opportunistically available LB and UB RRs in 3D cellular networks through device and network coordination while prioritizing the requirements of the devices and their associated services by identifying some of them as location-specific 3D spatially distributed parameters. Here, device and network coordination is defined by inclusion of two background interaction processes in supporting the main communication viz. the interaction between devices and the serving BS or the access point (AP) and inter-network interaction or interaction between co-located BS and AP. In addition, several congested locations in a SC are identified as critical areas (e.g., operation theaters) where some of the nearby devices are facilitated to use historical data on devices and services at those regions during the process of channel access. To successfully achieve the objective of this study, device and network coordination assisted learning based schemes are developed as the solutions by using principles of QL [52], S-ALOHA [53] and RL [26]. Moreover, these solutions are capable of accounting for multiple prioritization needs of the devices operated in both LB and UB even addressing conventional RACH access problem in congested environments. Success of the solutions are evaluated using several performance metrics like overall coordination efficiency, sum weighted volume capacity, network access delay and channel occupancy. Furthermore, solutions are with several general advantages like capacity to be operated in real-time, fast convergence and less processing power consumption as well.

C. CONTRIBUTIONS
The contributions of this study are identified on numerous directions in addressing problems related to 3D RR utilization through device and network coordination while accounting for QoS and access prioritization needs of the devices. Compared to all the existing work, this study fully adheres to the definition of a fully functional 3D network by inclusion of the four main components within the design and implementation phases. Then, the main contributions are summarized as: r In order to maintain performance standards and in meeting QoS aspects we introduce device network coordination assisted scheduling, transmit power control and prioritized access granting schemes. In contrast to conventional approaches, 3D spatial positions of the devices and several other related factors are used for them to achieve better performance. Several challenges like maximizing the access probability and minimizing waiting time for prioritized devices are also addressed with this solution.
r To expedite the process of UL RACH access and to increase the utilization efficiency of opportunistic 3D RRs on LB and UB, device and network coordination assisted learning algorithms are presented. When comparing against the previous studies, simultaneous access of LB and UB 3D RRs is considered with the inter network and inter device-network coordination. Same time, challenges of increasing the situation awareness, fast adaptation to the environment and coordinated use of opportunistic RRs are also addressed.
r In order to further enhance the utilization efficiency of opportunistic 3D RRs in UB, a RL algorithm is discussed while addressing the challenges of making the devices more proactive and ensuring use of UB RRs in a fair manner.
r For the purpose of implementation, functional protocols are discussed for the operations of the devices, BSs, algorithms and mechanisms used in UB and UB networks while addressing the challenge related to data and information exchange among different entities. The rest of the paper is organized as follows. System model and problem formulation are presented in Section II. In Section III, coordinated learning schemes for opportunistic RR utilization is discussed. Simulation results are analyzed in Section IV while the paper is concluded in Section V.
Notations: The independently and identically distributed (iid) real and complex Gaussian distributions with mean μ, and per dimension variance σ 2 /2 are represented by N(μ, σ 2 ) and N c (μ, σ 2 ), respectively. The argument of the maximum value is given by arg max(·).

II. SYSTEM MODEL AND PROBLEM FORMULATION
A deployment setup of three 3D SCs of a multi layer 3D cellular network is shown in Fig. 1. Each SC is with randomly dropped M devices indexed with m and they are served by a BS in the center of the cell where every BS is co-located with a WiFi AP as well. In this case, BSs are with the capacity to handle both 5G NR and NR-U traffic through LB and UB, respectively. Even though, the solutions presented in this study are highly compatible with the LTE [54] and LTE-U [20] systems, they are not considered in the system model as they are going to be obsolete and fully replaced with 5G NR and NR-U systems in the near future. In addition, considering the simplicity of the presentation and the study, only the middle cell is considered for the subsequent derivations and calculations. Some of the important symbols those are used in the system model and problem formulation together with some of the performance metrics are summarized in Table II. Fading channel for the device m is modeled as Here, (x m , y m , z m ) are the Cartesian coordinates for the location of the same device. Floating intercept, line slope parameter and lognormal shadowing for the path loss model are given by α (x,y,z) , β (x,y,z) and ξ (x,y,z) , accordingly [8]. Conventional site specific path loss parameters used in 2D networks are not indexed with Cartesian coordinates as they are considered to be constant for a particular cell. Corresponding site specific parameters for α (x,y,z) , β (x,y,z) and ξ (x,y,z) are given as α, β and ξ [55], accordingly. However, based on the scenario, there are certain provisions to model shadowing parameter as a sub-region or sub-site specific parameter [55]. For the simplicity of the presentation, these location-specific and frequency dependent path loss parameters [8] are not indexed with the frequency band. In Fig. 1, location-specific path loss parameters and common site specific path loss parameters based on conventional methods are marked. However, both locations of the devices and the path loss parameters related to those locations are assumed to be known. It is considered that the channel is constant over a period of a frame and the channel information can be estimated perfectly.
In addition, a finite set of critical or priority areas are also considered where they are created by identifying the requirements of the applications, data types and the devices. Operation theaters, fire routes and entrances to emergency rooms in hospitals are some of the examples for them. Based on the applications, devices in and nearby those areas may need stable radio links, access prioritization and maintenance of high QoS standards. Then, those devices are facilitated to use historical data from the network during the process of channel access, initial resource allocation and algorithm training while reducing information processing and channel access delays. In this case, it is reasonable to be considered that certain device related characteristics are also 3D spatially distributed parameters.
The observed receive signal at time t, y m [t], with AWGN s is the UL transmit power of the device m with power control step s. Since noise and signal are uncorrelated, the receive signal power p m s,R is given as In practice, receive signal strength is measured including noise power η 2 and it cannot be further purified to get only p RC s,m . In this case, p m s,R is used as the receive signal power for the subsequent calculations. Then, throughput volume capacity C V m of device m is given by [34], (36)] where W , d r , M and t U are bandwidth of the radio link, radius of the SC, set of devices in the SC and the time allocated for UL transmission, respectively. Since 3D SCs are considered for this study, in contrast to 2D cellular network designs, volume capacity is used as a performance metric where it is defined as the throughput capacity for a given unit volume in 3D space. Then, the maximum interference limit I max for the instantaneous interference p m s,R is established as

A. DATA PRIORITIZATION IN 3D NETWORKS
A brief overview of a very high-level architecture of a 3D SC HetNet containing a set of co-located UB enabled SC BS and AP combinations is given in Fig. 2. This architecture is developed based on the principles presented in [56], [57]. The coordination operations are managed by the UB enabled BS and the coordination interaction between each BS-AP pair is given by double-headed dashed arrows. In this network, types of data associated with the devices are considered as the prioritization factor for them. There could be mission critical data like real-time information exchange within a tele-surgery at lower layer and control data of an UAV at a higher layer. Delays related to them should be minimized while increasing the reliability of the transmissions [47]. In this multilayer 3D SC network, layer 1 or the ground level BSs, APs and associated devices are with the lowest elevation. Basically, all the devices on the ground are included to this layer. BSs, APs and associated devices with highest elevation are referred to the layer n or the top layer. There can be several other layers in between layer 1 and n based on their separation distance.
For a given device in a certain altitude, different types of data are expected in numerous quantities where they are handled under four sequential stages explained as 1) Classification: Data is classified on arrival using the headers from multimedia to control and emergency data. Then they are placed on temporary buffers. 2) Quantification: The quantity of data is determined based on the time a temporary buffer of a device is being held "full". If the duration is high, it is considered that the volume that data is also high. 3) Prioritization: Data is prioritized considering data volume, importance of data, elevation of the device and availability of RR in an alternative band as given in Fig. 3. Priorities could be varied according to elevation of a device as well. 4) Queuing: Data is buffered until been transferred to the BS. Based on this approach, 3D spatial distribution of different services, their QoS needs and priority requirements are well recognized.

B. MEDIUM ACCESS WITH OPPORTUNISTIC RRS
The devices are allowed to connect to the BS using frame based S-ALOHA scheme [15]. There are K slots in a frame indexed with k where a device can bid for a slot or a subframe in a data frame. Even though the LB is allocated for a particular operator and then to a cell through frequency planning, availability of time slots are entirely opportunistic for the devices during the process of UL RACH access. Initially, all the devices are allowed to access the UL time slots of the LB. Once all the time slots are filled, accessing the opportunistically available RRs in UB is initiated. Within the final study scenario, devices are allowed to access both bands in parallel.

C. DECISIONS WITH WEIGHTED VOLUME CHANNEL CAPACITY
In selection of devices to grant channel access, weighting factors are collectively defined by the devices and the BS where there are N w weights for the device m at (x, y, z). These weights are adjusted by the BS based on the priority requirements of the devices. As an example, w 1,m (x,y,z) can be proportionately adjusted according to the altitude of a device z m . Let w 2,m (x,y,z) be adjusted according to priority of data which is defined using current or historical data related to the nearest critical area as given in the horizontal arrows in Fig. 3. w 3,m (x,y,z) can be based on availability of an alternative band for communication. If a band is available, the weight is set to 0 and some other value otherwise. In this way prioritized channel access problem is converted to a RACH access problem supported with weighted volume channel capacity values.

D. PROBLEM STATEMENT
There are two main components of the problem viz. acquisition of opportunistic RRs in any of the bands and efficient utilization of RRs in UB. With the first problem, devices face the same RACH access problem.

1) ACQUISITION OF OPPORTUNISTIC RRS
In this study the problem of performance enhancement through better utilization of opportunistically available spatially distributed RRs in a 3D SC HetNet is addressed. Conventionally, in LB, it is known as RACH access congestion or RACH access problem. In this study, RRs in both LB and UB are utilized for the same problem while jointly accounting for several other important communication related factors like 3D spatial positions of the devices, their priority needs and the data types.

2) EFFICIENT UTILIZATION OF RRS IN UB
In contrast to LB, UB is with a greater flexibility in varying both transmit power and time for communication. In this case, problem of efficient use of opportunistically available UB RRs for communication is addressed while deviating from LB. For that, an optimization problem is formulated to maximize the sum weighted volume capacity C W,V Sum of a SC considering those RRs as variables subject to a set of constraints: where M U and M U are the number and the set of access granted devices in UL of the UB, accordingly. Transmit time with time control step q for the device m is given by t m q , q ∈ Q where Q, Q = {1, 2, . . . , q, . . . , Q}, is the set of time steps with the highest step Q. The lower and the upper limits for t m q are controlled by (3b) and (3f) with the minimum and the maximum allowed UL transmission times t min and t max , accordingly. Similarly, the lower and the upper limits for p m s are managed with (3c) and (3d) with the minimum and the maximum allowed UL transmission power values P min and P max , respectively. QoS needs are maintained by (3e) with C V min where C V min is the minimum volume capacity that the device m is required to achieve. Furthermore, interference to the BS from the other devices is controlled with (3g) as discussed in (2).
There are several purposes in addressing these problems. Some of them include facilitation of fast channel access for the devices, better serving their applications with enhanced sum volume capacity vales and guaranteeing QoS through efficient RR management. The technical challenges associated with these problems include exchanging information among waiting devices, selection of the best devices to grant channel access, utilization of location-specific and other related information to get priority in communication, and utilization of opportunistically available UB in a fair and efficient manner.

III. COORDINATED LEARNING SCHEMES FOR OPPORTUNISTIC RR UTILIZATION
Mainly the principles of reinforcement learning are used in developing device and network coordination assisted solutions for the problems of efficient RR management in 3D SCs. Their proven performance for real-time operations over conventional approaches even without prior knowledge on the environment [15] is the main reason to select them. Since they are with the property of fast adaptability to the operational environment, these model-free less complex algorithms are capable of quickly converging to sub-optimal solutions within a limited number of iterations. In addition, less amount of energy, processing power and device memory are required by them while leading to the best solutions under model unaware complex environments like the HetNet 3D SCs considered in this study. In this case, these reinforcement learning based solutions are identified as very effective, stable and efficient approaches over conventional methods and even the data driven learning techniques. When compared to traditional dumb and reactive wireless devices, the devices equipped with these solutions become intelligent and proactive equipment. Summary of symbols, functions and notations used in the solutions developed in this study are given in Table III.

A. DEVICE AND NETWORK COORDINATION
To establish device and network coordination, a DL broadcast frame is sent on the LB at the end of each UL frame [13], [15]. In contrast to conventional approaches, not only information on the occupancy of UL time slots but also information on device and network coordination like instructions and data on switching the band, initial power control and critical areas are also sent to the devices through the same frame. In addition, co-located AP is used to establish the coordination among wireless cellular NR, NR-U and WiFi networks. For accessing the UB also, the devices are coordinated through another DL broadcast information frame sent through the LB. Instructions on managing the transmission of the AP are sent to it by the BS as indicated by the double-headed dashed arrows in Fig. 2 according to the basic network coordination principles [56], [57]. To utilize RRs in the UB, a duty cycle based periodic access mechanism is suggested for the UL of the NR-U system. Communication through the AP is disabled for WiFi devices when the duty cycle is on for the mobile In certain scenarios, the devices are allowed to access both bands simultaneously, however, not by the same device. To avoid undue collisions, transmission of an UL frame by any of the devices on any of the bands is allowed only through an empty slot found after listening to a DL broadcast frame. Critical or priority areas are determined by the BS using historical data on both networks. Information related these areas is shared with the devices enabling them to be used by both devices at those areas and the BS, appropriately.
By using a DL broadcast frame, one of the main challenges on data exchange is mitigated. With the availability of location-based data thorough this frame, challenges of situation and location awareness, and fast adaptation to the situation and the environment are successfully addressed.

B. Q-LEARNING WITH NETWORK COORDINATION
In this study, a coordinated QL scheme is suggested as a solution for the RACH access problem in 3D SCs while efficiently utilizing the opportunistic RRs. This solution is further improved by jointly considering the device prioritization schemes and location-specific information. Compared to algorithm in [15], time slots are granted based on w m (x,y,z) C V m with the assistance of initial transmit power control. This QL algorithm, a model-free reinforcement learning scheme, is developed based on the agent-environment relationship with the action-reward function given by a Qtable [15], [52] where this environment can be described by a Markov Decision Process (MDP). At each time-step in this MDP, an agent in state i takes an action k ∈ K A trying to maximize own reward at time t given by r t and reaching the next state i , {i, i } ∈ I S , under a certain transition probability where I S and K A are the sets of states and actions, accordingly. Considering a certain action selection policy π QL and state-action pair {i, k}, Q-function and the optimum Q-function are given as Q π QL (i, k) and Q * (i, k) = max πQL Q(i, k), respectively. The action is selected based on the highest Q-value as π QL (i) = arg max k Q(i, k). Then, the Q-value at iteration or time t + 1 could be updated as where λ, {i , k } and γ are the learning rate, next state-action pair and the discount factor, accordingly.
In this QL solution, device m is allowed to select time slot k and Q-value is updated at the end of each frame as where reward function is defined as C V CAvg and R C (k) are the average volume capacity for the cell and penalty for a collision due to simultaneous packet access, respectively. When there are K time slots, the congestion report is defined as R C = 1 M [R (x,y,z) (m, 1), R (x,y,z) (m, 2), . . . , R (x,y,z) (m, k), . . . , R (x,y,z) (m, K )] and sent with the DL broadcast frame. As it is explained in Fig. 4, for the time slot or sub-frame k, R C (k) is set to 1 when the slot is not assigned to a device. With this design, challenges of minimization of collision in devices accessing the channel simultaneously, avoidance of accessing already occupied slots and undue congestions created by continuously attempting machine type devices are addressed.

1) MULTI-FACTOR PRIORITIZATION SCHEME FOR ACCESS GRANTING
Even though the devices and services are prioritized, solutions cannot be implemented neglecting the link quality. In this case, to allocate time slots for the devices, a multi-factor, weighting assisted, situation aware, location dependent prioritization policy is used while considering QoS aspects. For this strategy, to determine the weighting factors given as w m (x,y,z) , necessary information is sent to the BS by the devices with bids for time slots. In the absence of current data, these weights are calculated based on the historical data provided by the devices operated at the same location previously. For a device in the vicinity of a critical area has the option to use most favorable set of data out of historical or current data sets. In order to a device to be get qualified for this process, condition that the distance from that device to the center of the nearest critical area d m C should be less than or equal to the maximum allowed distance d C max is to be fulfilled as indicated by d m C ≤ d C max . As it is indicated in Fig. 4, there may be situations where a given time slot is accessed by a single device, more than one deice or no device with sufficient receive signal power enabling the devices to be detected at the BS. In a case of multiple devices are bidding for a time slot, there may be a situation where more than one or more devices are detected for that particular vacant slot. In a that kind of situation, the prioritization scheme is used to grant the slot to the device with the highest weighted volume capacity w m (x,y,z) C V m . If there are two or more devices with the same w m (x,y,z) C V m value, access is given based on z m . However, if those devices are in the same height, access is granted to a randomly selected device. If no device is identified, no action is needed. If one device detected, access would be granted to that device. Subsequently, the congestion report is generated accordingly. If an alternative band is not available for a device, high priority is given for that device through weighting trying to secure a band and a time slot for that device.
With this scheme technical challenges of getting the priority for certain devices and avoidance of long waiting times to get channel access are successfully addressed. In other words, attractive solutions are provided for the challenges of access delay minimization and access probability maximization for prioritized devices.

2) SCHEME FOR TRANSMISSION DEVICE POWER ADJUSTMENT
To determine the initial transmission power, both current 3D location-based and historic prioritization information are considered. As it is explained in Algorithm 1, if a device is in the close vicinity of a critical area, it is allowed to select the highest transmit power value out of s m and s C at the first attempt itself. Here, s m and s C are the power control step selected according to self estimated device priority level and power control step recommended by the BS, respectively. The step s C is based on the historical data on the location and that information is passed to the devices through the DL broadcast frame as a part of device and network coordination. All the other occasions, power control step index s is incremented by one (s ← s + 1) while increasing p m s . This process is continued until channel access is granted. Transmit power values set P is given as P = {p 1 , p 2 , p 3 , . . . , p m s , . . . , p S } with S = {1, 2, 3, . . . , s, . . . , S}, S = |P|, and with the highest transmit power p S . This approach is with a clear difference compared to many of the open-loop or closed-loop gradual power control mechanisms used at the devices particularly in determining initial transmission power values [38], [58]. Moreover, this solution enables the devices to reach the most suitable initial transmit power values quickly and efficiently. In addition, with the congestion report, instructions are sent on transmit power adjustments based on the interference limits given in (2). In LB, after reception of access, devices are allowed to adjust the power levels based on the same principle until the maximum safe limits are reached. Flowcharts for the operations of the devices and the BSs are given in Fig. 5. Functions related to the transmission process at a device and the BS are explained in Fig. 5(a) and Fig. 5(b), respectively.

C. COMMUNICATION IN UB
In this solution, use of UB is considered as a form of utilization of opportunistically available RRs for communication when and only when the LB time slots are fully occupied. Furthermore, simultaneous access attempts or connections to the UB enabled BS through both bands by the same device are also not allowed. Due to frequency dependent nature of the path loss characteristics, a separate set of location-specific values are expected for the path loss parameters in the UB [8]. In addition, this band cannot be fully occupied by the devices operated through NR-U channels and should be shared with the other wireless networks like WiFi [27] in a fair manner.
In this scenario also, devices are allowed to make the decision on accessing the channel using QL algorithm and to select the most suitable transmit power values using the Algorithm 1. In conventional approaches, UB channel access is entirely managed by UB enabled BSs where there were no proactive devices [6]. This band is accessed and used according to the time division duplex (TDD) frame of the same length as of LB using a NR-U interface using a duty cycle where the structure is presented in Fig. 6. Percentages of duty cycles are allocated based on the amount of WiFi traffic available at that occasion through the co-located WiFi AP where higher the WiFi traffic lower the duty cycle percentage for communication through NR-U interface. Furthermore, NR-U operation is cooperated by the AP by switching on and off the WiFi transmissions appropriately. For the simplicity, access beacons or the guard frames are not discussed. Each TDD frame is capable of serving a number of devices as indicated by different colors. At the start, all the devices are allocated with equal time durations. In this study, a heavily congested environment is considered where there are more than or equal number of devices than the number of slots in a frame.
Once access is granted, to get the optimum utilization of opportunistic RRs, time durations allocated for the devices are dynamically adjusted using device and network coordination.
That would lead to have unevenly allocated time durations for the devices at the point of convergence based on their performance. In contrast to conventional LB operations, in this method devices get the opportunity to adjust both transmit time and power to reach a sub-optimum solution for the optimization problem presented in (3a) using RL principles [26].
To implement this solution, by considering (3a), the RL game r Utility function: Weighted volume capacity of the device m at a given location w m (x,y,z) C V m is used as the utility function. Probability value set for the elements in P after game round l of total rounds L is given by π m (l ). π m s (l ) ∈ π m (l ), π m (l ) = {π m 1 (l ), π m 2 (l ), . . . , π m S (l )}, is the probability of achieving volume capacity target C V tar for p m s . Since sum weighted volume capacity of a cell is optimized, cell level values are used to update the probability function. When s = s , s ∈ S and with π m s (l ), π m s (l + 1) is given [26] as where C V max is the maximum volume capacity that the SC can achieve, τ (l ) = 1 l+1 and g m (l ) = 1 c is the indicator function with condition c. p m s (l + 1) = p m s is chosen with s = arg max s ∈S (π m s (l )) followed by an increment in time with (q ← q + 1) for the next iteration t m q (l + 1) = t m q . The process of optimization and strategy of the game G are explained in Algorithm 2 and the complete operations on the UB is managed by Algorithm 3.
Over iterations, convergence of the game G is expected to be achieved reaching the convergence equilibrium for all m ∈ M U and {s ∈ S, q ∈ Q} with π m s (l ) > 0 [26]. This is under the assumption, that the game has at least one equilibrium under pure strategies. In contrast to equilibrium achieved in game theory [26], the equilibrium achieved in the RL can be unstable over time under considerably dynamic environmental conditions. In this kind of situation, again the agents can achieve new convergence equilibrium under new conditions. In this case, with RL always convergence equilibrium is expected giving sub-optimal solution for the optimization problem.

D. DEVICE AND NETWORK COORDINATION FOR SIMULTANEOUS OPERATION OF LB AND UB
In order to further reduce channel access delays while improving utilization efficiency of opportunistically available RRs, a device and network coordination based approached is introduced for simultaneous use of LB and UB. In this solution, challenge of coordination of wireless cellular and WiFi networks is mitigated by using a co-located BS and a WiFi AP. The challenge of making the devices more proactive is mitigated by allowing them to select an appropriate time slots by themselves using the same QL algorithm. This decision is further supported with information provided by the BS on availability of time slots on either bands. However, the same device is not permitted to access both bands at the same time. As it is explained in Fig. 7, an integrated DL coordination frame is used to provide data for the Q-tables of the devices. This frame is designed combining information from both bands and transmitted through the LB.
Step by step operations related to device and network coordination in allocation of time slots in LB and UB is explained by Algorithm 4. In this algorithm, the number of devices detected, the set of detected devices and the set of time slots in a frame are given by M D , M D and K, respectively. L P,L m and L P,U m are used to represent the path loss values for the device m in LB and UB, accordingly. In this scheme also, the vacant slots are assigned to the device based on the prioritization scheme. For a given instance, when both LB and UB are available, the algorithm is capable of coordinate the devices and the two networks to assign the band with the minimum path loss to the device with the highest weighted volume capacity and the remaining band to the device with second highest weighted volume capacity irrespective of the bands that the devices attempted to access the channel.

IV. SIMULATION RESULTS
A spherical SC of approximate radius of 20 m is considered where the BS is at the center of it. This is the layer 2 SC in Fig. 1 containing 10 critical or priority areas where each of them is 4 m in radius. The maximum allowed distance from a device to the center of the nearest critical area d C max is also set to 4 m. The devices are with the maximum and the minimum transmit power values of 2.6 dBm and 26 dBm, respectively where there are discrete power control steps in between them. The threshold power value for the cell coverage P E is set to −90 dBm.
For the path loss model, parameter values for the LB operated in the 28 GHz band are set as 61 ≤ α (x,y,z) ≤ 72, 2 ≤ β (x,y,z) ≤ 2.8 and ξ m (x,y,z) ∼ N(0, σ 2 PL ) where σ PL , σ PL = 8.7, is an estimated random value based on the location [8]. Values for the same set of parameters used in the UB operated in 6 GHz band are specified as 31.4 ≤ α (x,y,z) ≤ 34.7, 3.49 ≤ β (x,y,z) ≤ 3.85 and ξ m (x,y,z) ∼ N(0, σ 2 PL ) with σ PL = 4, accordingly [8]. All the parameters are made sensitive to vertical angle in the range of 0 • -90 • (i.e., z ≥ 0) by randomly selecting the parameters including the line-of-sight (LOS) scenarios [8], [55]. However, for the vertical angle in the range of -90 • -0 • (i.e., z < 0) LOS is not allowed. TDD data frames of 10 ms are used for the 5G NR and NR-U transmissions with 100 time slots or sub-frames. The algorithms are implemented at the BS and the collaborative devices based on the assumption that all the devices, BS and servers are properly synchronized with no additional core network or processing delays. Default learning rate for the QL algorithm is set to λ = 0.1 unless otherwise it is mentioned.
In calculating weighted capacities, three prioritization weighting factors w 1,m (x,y,z) , w 2,m (x,y,z) and w 3,m (x,y,z) are considered representing operating height of the devices, user data criticality and availability of alternative band for communication. The minimum and the maximum limits for each weighting factor are set as of 0 and 1 with five intermediate levels. For the first weighting factor, weights are assigned proportionately to the altitudes of the devices. In the second weighting factor, the data carrying control or emergency information is given the highest priority by allocating the highest weights. For the third weighting factor, weight of 1 is assigned when there is no alternative band available for communication and 0 when there is an available band. As an example, if the channel access attempt is on UB and if the LB is fully occupied, weight 1 is assigned. If still there are some vacant slots in the LB, no weight is added in favor of that device.

A. EVALUATION OF COORDINATION EFFICIENCY
Three simple performance metrics namely coordination efficiency E C,t , sum volume capacity enhancement C V E,t and overall coordination efficiency E OC,t are used to further evaluate the performance enhancement due to device network coordination. In all these metrics, performance is evaluated with respect to the same iteration of a reference solutions and they are defined as Here, O t , O R t , C V Sum,t and C R,V Sum,t are frame occupancy at iteration t with the proposed solution, frame occupancy at iteration t under reference method, sum volume capacity at iteration t with the proposed solution and sum volume capacity at iteration t under a reference method, respectively.

B. COMMUNICATION IN LB
In accessing UL radio channel and RR allocation, performance of the coordinated QL assisted S-ALOHA algorithm for numerous device distribution densities is given in Fig. 8. Convergence of the algorithm for weighted sum volume capacity values is shown in Fig. 8(a) where corresponding volume capacity values are given in Fig. 8(b). Here weighted sum values are used as a performance measure for decision making in assigning time slots or allocation of resources where values without weights are used as a performance measure for the system. In addition, the difference between with and without prioritization scheme [15] is a performance measure for the efficiency of the improved device and network coordination assisted solutions suggested in this study. The results for device densities of 20, 60 and 100 (or more) are investigated. Generally, sum volume capacity values are increased with the increase of number of devices until the maximum number is reached. Channel access interference and packet collisions are reduced due to two reasons. First reason is that the devices have to listen to the broadcast message containing congestion report and other information before sending access request messages. Second reason is that the channel access time slots are selected based on the learning algorithm. Dee to these reasons, comparatively better or faster convergence is shown by the algorithm with device prioritization mechanism compared to the reference method in [15].
Performance of QL schemes with and without prioritization scheme for the cell is shown in Fig. 9. Occupancy of the sub-frames or slots with λ = 0.1 is shown in Fig. 9(a). Fast occupancy is indicated with the prioritization scheme at the early iterations. In the case of 100 or more (maximum number of devices), for the iterations 1-4, better frame occupancy rates are shown for the algorithm with prioritized slot allocation mechanism compared to the reference scheme [15] as 68.42%, 43.33%, 29.73% and 16.67%, respectively. Further, with the prioritization scheme full channel occupancy is achieved with 50% less time or 200% faster than regular scheme (i.e., 5 iterations with and 10 iterations without prioritization scheme). Similar kind of behaviors are shown for the other device densities as well. This is a very important characteristic when serving the devices with urgent or emergency data. Convergence of those two coordinated QL schemes under different learning rates are shown in Fig. 9(b). Faster convergence could be observed with the increase of learning rate giving evidence for proper functionality and the implementation accuracy of the algorithms.

C. COMMUNICATION IN UB
Occupancy of the sub-frames or slots when number of devices are greater than the number of available slots is shown in Fig. 10 under three duty cycle percentages 20%, 40% and 60%. For all these duty cycle percentages, curves with prioritization scheme coincide with each other. Similarly, curves without prioritization scheme also coincide each other. However, for all the duty cycle percentages there is a considerable time reduction in achieving 100% frame occupancy for the schemes with prioritization schemes over the conventional method [15]. In UB also, with the prioritization scheme full channel occupancy is achieved with 50% less time or 200% faster than regular scheme (i.e., 5 iterations with and 10 iterations without prioritization scheme). In addition, better frame occupancy rates are shown for the algorithm with prioritized slot allocation mechanism compared to the regular algorithm [15] for the iterations 1-4 as 73%, 44%, 27% and 17%, respectively. These results are clear indicators of the success of the device and network coordination assisted mechanisms used for this work.
Results for QL algorithm in UB under different duty cycle percentages are shown in Fig. 11. Here, it is considered that the number of devices are greater than available time slots. Convergence of the plots for the weighted sum volume capacity values are shown in Fig. 11(a). It is indicated that when the device prioritization scheme is used, convergence is achieved faster than regular occasions [15] while supporting QoS and other requirements of the devices and the SC. Corresponding sum volume capacity plots are shown in Fig. 11(a) which are the true performance indicators of the system.
Upon getting access to the BS, communication performance is further improved with RR management done through RL principles assisted algorithm. This algorithm is expected to better utilize opportunistically acquired RR in the UB. At the start, in a given duty cycle about 10% of the resources are allocated for a single device instead of one slot. Subsequently, allocated time for each device and power are adjusted (remaining in the same duty cycle) to get the sub-optimum sum volume capacity performance while utilizing available RRs in an efficient manner. Overall performance of this scheme for the SC is shown in Fig. 12 under different duty cycle percentages. Convergence of the plots for the weighted sum volume capacity values under different duty cycle percentages are shown in Fig. 12(a) and corresponding sum volume capacity plots are shown in Fig. 12(b). In this case also, true performance of the system is reflected with the sum volume capacity plots.
Both QL and RL principles based schemes are capable of converging almost below 15 iterations while showing acceptable sum volume capacity values at the convergence. This is a clear indication that the schemes are capable of being used for real-time operations in the wireless communication systems very successfully.

D. COORDINATION EFFICIENCY OF LB AND UB OPERATIONS
Relative performance improvement of the device prioritization assisted QL algorithm compared against reference QL algorithm is shown in Fig. 13. The device and network coordination assisted mechanism and the device prioritization scheme are the two main reasons behind these impressive results. In this case, device and network coordination efficiency for LB and the UB are given in Fig. 13(a) and Fig. 13(b), accordingly. Considering scenario of 100 devices or more in the LB and all the scenarios in the UB, nearly 70% and over 75% coordination efficiency values are shown with the solution suggested with this study at the initial iterations, respectively. With both O t and O R t reaching to 100% over iterations, the difference (O t − O R t ) becomes zero leading to have zero coordination efficiency over iterations. Corresponding, sum volume capacity enhancement values for LB and the UB are given in Fig. 13(c) and Fig. 13(d), accordingly. In these cases also, nearly 70% and over 75% sum volume capacity enhancement values are shown with the solution suggested with this study at the initial iterations for the scenario with 100 devices or more in the LB and all the scenarios in the UB, respectively. When it comes to sum volume capacity enhancement also, for the case of 100 devices or more in the LB and for all the scenarios in the UB, approximately 70% and 75% performance results are shown for the early iterations, accordingly. With both C V Sum,t and C R,V Sum,t reaching to their sub-optimum values over iterations, the difference (C V Sum,t − C R,V Sum,t ) is drastically reduced leading to have very low sum volume capacity enhancement values over iterations. Overall coordination efficiency with the proposed algorithm over reference method is shown in Fig. 14 where values for the LB under different device densities is shown in Fig. 14(a) and values for different duty cycle percentages are presented in Fig. 14(b). Approximately, 70% performance is shown for the cases of 100 devices (or more) in the LB and for all the duty cycle scenarios of the UB. With drastic reduction of both E C,t and C V E,t over iterations, the summation (E C,t + C V E,t ) is also reduced leading to have very low or near zero overall coordination efficiency values over iterations.

E. DEVICE AND NETWORK COORDINATION FOR SIMULTANEOUS OPERATION IN LB AND UB
In order to get much better understanding on efficient utilization of opportunistically available RRs, performance of device and network coordination for simultaneous operation of LB and UB is studied against that of the sequential operation. Overall sum volume capacity of the BS in accessing and utilizing opportunistically available RRs on LB and UB sequentially and in parallel with and without device access prioritization scheme are shown in Fig. 15. Always UB is accessed under the scenario of with 100 devices or more where three duty cycle percentages 20%, 40% and 60% are considered for that. In Fig. 15(a), UB is accessed once the LB is fully occupied. The device prioritization algorithm is capable of allocating all the opportunistic RRs in the LB during first 7 iterations for scenario of device density of 100 or more. Then, the remaining devices are facilitated to access the opportunistically available RRs in the UB where they are captured and allocated by the same algorithm from iteration 8 to 13, until the convergence. Starting from iteration 14, for all the duty cycle percentages RL algorithm is used to optimize the RR utilization efficiency. In summary, with and without prioritization mechanism, about 20 and 35 iterations are spent to reach the final sub-optimum solutions, accordingly. As it is indicated in Fig. 15(b), in the case of device density is more than 100, for all the duty cycle percentages both UB and LB are accessed in parallel. With the device prioritization algorithm, RRs in both bands are allocated within first 7 iterations. Starting from iteration 8, RL algorithm is used to optimize the RR utilization efficiency. About 11 and 20 iterations are used to reach the sub-optimum solutions with and without device prioritization scheme, respectively.
Relative performance enhancement on device and network coordination related to parallel access of LB and UB for the prioritized QL scheme against the regular reference method measured for the SC is given in Fig. 16. Device and network coordination efficiency is given in Fig. 16(a) and sum volume capacity enhancement is given in Fig. 16(b). In both subplots, performance values are over 70% at the start of the iterations and they begin to decline over iterations based on the same reasons given for the sequential band access scenarios.
Overall coordination efficiency for the 3D SC is given in Fig. 17. Overall coordination efficiency measured for the device prioritization scheme against conventional approach when LB and UB are accessed in parallel is given in Fig. 17(a). Here also, performance values are over 70% at the beginning and start to fall due to the reasons explained under the sequential band access scenarios. Then, overall coordination efficiency measured for LB and UB when they are accessed in parallel with device prioritization against when they are accessed in sequential manner using conventional method is presented in Fig. 17(b). For this, E C,t and C V E,t are defined considering sequential and parallel band utilization modes as  Fig. 17(b). However, those efficiency values start to fall sharply with the start of allocation of RRs in the UB in the reference or the sequential band access method after about iteration 7.

V. CONCLUSION
Device and Network coordination assisted mechanisms for opportunistic utilization of RR in 3D SC HetNets were studied in this work. For that, approaches for fast resource allocation and efficient resource utilization for the devices in dense wireless communication networks are presented while meeting different requirements of the applications and the devices. In this case, the problem of performance reduction due to negligence of different location-based information, priority requirements of their data types, availability of LB for communication, delays in RR allocation in a 3D network is addressed while satisfying the QoS requirements of the devices. For this problem, QL and S-ALOHA principles based solution is developed with the help of device and network coordination which is even capable of ameliorating the RACH congestion problem. Subsequently, a RL algorithm is presented to utilize UB RRs in a very efficient manner. In all these solutions, location-specific and 3D spatially distributed RRs in LB and UB are utilized while recognizing their limited and opportunistic availability for the devices in heavily congested device distributions. The effectiveness and success of the study are shown with a set of an attractive results presented in terms of several performance metrics including sum volume capacity and overall coordination efficiency.