5G New Radio: Dynamic Time Division Duplex Radio Resource Management Approaches

The Internet of Everything is currently in demand and has burdened the network tremendously. Accommodating this exponential increase in demand will require improved Radio Resource Management technology. This problem can be curbed with higher spectrum bands, reevaluation of Time Division Duplex, deployment of Software Defined Network, and Network Function Virtualization into 5G New Radio (NR) network. Therefore, this work aims to provide an in-depth survey on the recent resource management schemes that can be proposed for 5G NR enhancement by exploiting both rule-based algorithms and machine learning methods. Radio resource management consists of managing user allocation, the antenna transmission power, bandwidth, and modulation scheme. Therefore, in this paper, three categories of radio resource management technologies are introduced: resource allocation, energy efficiency, and interference management. The discussion revolves around their potentials and contributions as well as challenges faced to produce efficient 5G resource management schemes.


I. INTRODUCTION
Communication technology has evolved rapidly to support user demands on various applications. 5G New Radio (NR) is one of the promising technologies that focused on preparing a platform that provides high data rates, capacity, thousands of interconnected devices, and massive support for mission-critical applications [1]- [3]. 5G NR is a new radio access technology (RAT) developed by 3rd Generation Partnership Project (3GPP) for the 5G mobile network. It was designed to be the global standard for the air interface of 5G networks. The 5G system is divided into three specific use cases, which are ultra-reliable low-latency communication (uRLLC), massive machine-to-machine communication (mMTC), and enhanced mobile broadband (eMBB) to realize the above promise. These use cases create new challenges in providing a reliable system as each of the use cases has different criteria and requirements to be met [4].
The associate editor coordinating the review of this manuscript and approving it for publication was Yiming Huo .
Time-division duplexing (TDD) is the application of timedivision multiplexing, which separates uplink and downlink transmissions. With TDD, a system can benefit from the asymmetry of the uplink and downlink transmission be improvised further with the emergent usage of high frequency above 10 GHz [5]- [8] and numerous subcarriers spacing which covers the diversified requirement. TDD also allows the system to cooperate with massive traffic variation [9]- [11] and use the benefits from implementing the Software Define Network (SDN) [12]- [14]. However, with static LTE-TDD, it is difficult to fulfill fast traffic adaptation, leading to the underutilization of base station resources. Other limitations, such as pseudo-congestion and lack of adequate TDD multi-connectivity schemes, can reduce the flexibility of resources and performance of the system [15]. TDD also requires all base stations to synchronously operate in both uplink and downlink transmission modes, which is allocated statically, contributing to inefficient use of resources [16].
A dynamically configurable resource named Dynamic Time Division Duplex (DTDD) is presented to overcome the ineffective spectrum utilization problem. As the mobile broadband applications in the Ultra Dense Network (UDN) are transmitted in large and bursty data, the transmission bandwidth in both the uplink and downlink will be asymmetric and fluctuate depending on the traffic pattern [10]. To accommodate this problem effectively, DTDD is considered to allow the frame configuration as in uplink and downlink directions to be adjusted according to the network needs, resulting in better system flexibility towards rapid demand changes. DTDD reportedly allows neighboring cells to operate with different uplink and downlink subframes or slot configurations in 5G. It is a promising enhancement for the conventional static TDD. Other than that, as compared in [17], by using traditional static TDD, the radio resource allocation depends on average traffic load, and transmission bandwidth is fixed. At the same time, DTDD is based on the instantaneous traffic load in which the switching point between UL and DL in each radio frame is flexible. These dependencies of DTDD give advantages to the current aims in 5G NR to provide fast adaptation towards rapid demand changes. Additionally, by adjusting the frame configuration, DTDD will also manage to achieve lower queuing delay and provide a better quality of service (QoS) to the end-user.
However, migrating to a DTDD tends to have more complex issues on interference management, control signaling, and limited coverage [18]. Furthermore, a dynamic configuration system can contribute to issues such as low gain in the uplink transmission and a heavily loaded system. Other than that, a fast hand-off process and low delay are some of the critical criteria in 5G UDN [10], [19]. In addition, if appropriate interference mitigation is not employed, a dynamic system may perform worse in interference-limited scenarios [20]. Therefore techniques such as network slicing, machine learning, and other intelligent algorithms can be implemented to allow further improvements and benefits from autonomous optimization [21]- [23]. The intelligent system will not only refer to historical data to make predictions for a particular period but will continue referencing newly acquired data to improve how it analyzes consumption patterns [24]. The comparison between TDD and DTDD is as shown in Table 1.
Implementing machine learning can improve today's communication system and overcome various challenges by analyzing the surrounding parameters to forecast peak traffic, resource utilization, and application types to help with DTDD problems. The advantages of implementing machine learning in communication systems can be realized from the development of mobile edge computing (MEC) [25], [26]. This results in better efficiency and eliminates coverage problems by measuring interference and inter-site distance information. With machine learning, many applications, such as vehicle network, security, machine intelligence, and adaptive discovery self-learning, can be implemented efficiently [27]. Further evaluation of the machine learning approaches' performance highlights three important elements: rapid response, precise environment simulation, and self-adaptation. Firstly, the developed system needs to provide a high-dynamic vehicle network with rapid response. Other than that, with fast vehicle movement, a system that allows fast handover of wireless connections and the rapid allocation of resources is necessary [28], [29]. The second element is the precise environment simulation with various requirements and demands [30], [31]. The last element is the model's capability to self-adapt and cooperate with changes in the mutative network and topology. In addition to SDN architecture taking place in communication technologies, the framework's implementation will not only be limited by machine learning approaches. The system can also easily manage sophisticated algorithms to decide or act as an observer to the numerous parameters around it. This could provide the same performance as machine learning to implement complex radio resource management schemes. Radio resource management has been researched in a wide range of system control involving controlling user allocation, antenna transmission power, bandwidth, and modulation schemes, such as enhancing antenna array for better beamforming capabilities [32] and optimization on energy efficiency and resource management in mmWave [33]. Therefore, in this paper, these strategies are categorized into three categories of resource management which are resource allocation, energy efficiency, and interference mitigation, such as in Fig. 1. The contributions of this paper are as follows: • A list of important subject in 5G-resource management technologies and an overview of each technology • Related studies on machine learning and algorithm-based resource management techniques relevant to 5G networks which are categorized into resource allocation, energy efficiency, and interference management This paper is organized as follows. Section 2 focuses on the available work on resource allocation in both LTE and 5G. Section 3 displays the related work on energy efficiency for improving both network QoS and energy usage. Section 4 reviews the related work on interference mitigation techniques for 5G system. Finally, the issues and challenges for 5G resource management are detailed in Section 5. Acronyms and notations used in this paper are depicted in Table 2.

II. RESOURCE ALLOCATION A. MACHINE LEARNING APPROACHES
Resource allocation means assigning the allocated time, space, and frequency domain in the spectrum to the needed client. Therefore, to satisfy the user's QoS requirement, these elements need to be managed. By implementing machine learning, the system capable of creating dynamic resource management can adapt to rapid changes. However, the choice of a machine learning model is crucial to allow optimum compatibility of the system. Generally, machine learning comprises five major tasks in processing natural languages. The tasks are to group, match, translate, predict, and decisionmaking.
Deep Learning is a sub-field of machine learning and widely used across communication systems. It is a prediction model inspired by the structure and function of the human brain called artificial neural networks. This kind of neural network uses a hierarchical level built similar to the human brain, connected with neuron nodes like a web. The flexibility of deep learning allows the prediction of complex instructed data. The differences between machine learning and deep learning as shown in Fig. 2.
A prediction model comparison study shows that deep learning performance outperforms other machine learning models in completing five predefined tasks. Since then, deep learning is seen as a significant change in the autonomous system, including communication system [34]. Deep learning also offers some benefits, such as achieving high-accuracy channel estimation issues in multiple-input multiple-output (MIMO) systems. Deep learning techniques can be adopted to help for determining the direction of arrivals (DOA) and channel estimations [35]. Other than that, deep learning also can provide stable average estimation errors on realistic outdoor scenarios, reduce the time overhead, and give high beam selection accuracy providing reliable radio positioning and beamsteering for 5G NR [36], [37]. Not limited to that, deep learning also capable of being employed as an offline learning and online learning method, which is effective to learn the statistics of the wireless channel to gain higher resolution channel estimation [38]. 5G resource allocation also benefits from deep learning deployment as it can minimize energy usage and improve packet loss rate and throughput from the prediction model constructed [39], [40].
To further look into deep learning capabilities, a study by Tang et al. [27] uses the deep learning long short-term memory (LSTM) method to transform UDN base station into a spatial traffic load forecast. The proposed model is based on spatial projections, where the correct course of action is applied before the smart prevention and mitigation of congestion. The key advantage of LSTM is that it can study the latest data and also the historical dataset to render all possible forecasts. The prediction is made by understanding the relevant context and adjusting the uplink and downlink configurations to make base station queues a suitable slot before actual congestion happens. The approach can make a spatial prediction of future congestion and will try to avoid or reduce congestion beforehand. This resulted in a much lower loss rate for the packets. Another interesting deep learning study for the 5G network is reported by Hossain et al. [42]. The purpose of the study is to predict future traffic congestion. An intelligent allocation of radio space for 5G networks using a deep learning model is proposed, consisting of a deep tree model and an extended short-term memory network. The proposed model has a treelike structure, where it benefits from various activation functions and layers of batch normalization from the tree root to the leaves. The model uses a division strategy to lower the number of parameters in each section. The output of the tree-based deep model layer is fed into the LSTM. A timestamp is applied to the layer to supply the deep model data to the LSTM's right temporal feed. In this way, it is possible to achieve a low packet loss ratio and high throughput.
Liao et al. [43] also examine a form of deep learning aided resource allocation for a novel model-driven. The authors created a new Deep Neural Network (DNN)-based optimization method composed of a series of iterative procedures for the Alternating Direction Form of Multipliers (ADMM), which makes the channel state information (CSI) the qualified weights. The channel information absent Q-learning (CIAQ) model is then proposed to train the DNN-based optimization of resource allocation system without significant data labeling. Here, the spectrum efficiency, energy efficiency and fairness can be jointly optimized by modifying the DNN-based discount optimization mechanism based on repeatedly running the iterative ADMM procedures. The proposed CIAQ model sets energy efficiency and fairness as an autonomous and complex compensation for learning the uncertain weights. Then, the algorithm can make resource allocation decisions within a relatively small number of iterations based on limited CSI.
Other than the deep learning approach, Wang and Tao [44] proposed a solution to optimize available resources by reducing interference with cross-links using Q learning. Q Learning is model-free Reinforcement Learning (RL) that helps each agent to improve their performance by trial and error behavior [45]. For each femtocell, the configuration of the uplink/downlink switching point for each cell is based on their local information data to satisfy the demand for asymmetric traffic without causing severe interference between the connections. Moreover, to achieve high traffic by using only the user estimated location, one of the supervised machine learning techniques known as the random forest algorithm is used for manipulating the relationships between program parameters and user position estimation. This way, higher bandwidth efficiency is achieved, and the overhead for CSI processing is reduced using the location calculations [46].
An application that will benefit from 5G dynamic slot allocation in the TDD frame is the Vehicle Ad-hoc Network (VANET). VANET is a mobile node system, which constitutes a node attached to a vehicle. VANET is realized using a wireless multihop network, which has the constraint of fast topology changes due to the high node mobility [47]. For high-speed movement connectivity in VANET, the implementation of machine learning-based resource allocation will benefit this area. The fast-changing environment within the dense network needs an allocation algorithm for reinforcing learning-based tools, which not only considers the various parameters of network status but also incorporates them in the learning phase [48]. The increased number of vehicles equipped with computing technologies and wireless communication devices opens up new opportunities for inter-vehicle communication study to become a promising field in various sectors, including research, standardization, and development. VANET enables a wide range of applications, such as collision prevention, pedestrian safety, dynamic route scheduling, and real-time traffic condition monitoring, such as shown in Fig. 3 [49]. Therefore, to build a robust VANET network, Zhao et al. proposed a context-aware framework based on background review and optimization of using the multi-armed bandit (MAB) [50]. MAB is a model derived from reinforced learning to create real distributions; each distribution is associated with the rewards. For example, a slot machine with several arms with each arm having its own rigged probability distribution of success. Therefore, using this technique, updating dynamic TDD resource allocation according to network traffic is possible by setting the contextual upper-confidence bound in the MAB model. The TDD configuration is also integrated with guaranteed resource allocation (GRA) to reduce energy consumption and increase the rendering range of mobile devices. The use of the SDN controller with a hybrid-fuzzy logic-guided genetic algorithm (H-FLGA) approach can also address a multi-objective resource allocation issue for 5G-driven VANETs [51]. The new solution would help network service providers to introduce a consumer-centric network infrastructure based on the diverse customer needs of their clients. This model also requires additional time to reconfigure services, which may deteriorate QoS requirements. Besides, existing MEC research focuses only on resource allocation between mobile devices and MEC servers, ignoring the enormous computing resources in centralized cloud computing centers. This would result in increased latency in transmission.
Implementation of machine learning in a passive optical network on LTE (PON-LTE) system is also worth to be studied as this technology can provide up to 10Gbps transmission capabilities. Sarigiannidis et al. [52], proposed a traffic-aware dynamic uplink/downlink configuration for managing the backhaul network on SDN architecture. The system consists of a 10-gigabit-capable passive optical network (XG-PON) in the network backhaul and LTE wireless access networks in the front haul. This traffic-aware system will determine the most appropriate configuration in the entire hybrid network, based on the traffic dynamics. The implemented scheme was tested with seven configuration options predefined in the LTE. To perform this technique in the 5G network, the configuration of the preset symbol stated in 3GPP specification can be beneficial to significantly improve network congestion and jitter.
Other than that, game theory can also provide good support on the 5G system. El Bamby et al. [53] use non-cooperative game theory between small cell base stations to minimize its total delays in uplink and downlink flows. A selforganizing uplink/downlink resource configuration scheme for TDD-based small cell networks is proposed to solve this game, allowing a small cell base station to estimate and learn the uplink and downlink loads autonomously while optimizing its uplink/downlink configuration. The small cell base stations learn and calculate their current uplink and downlink delay in this game based on its traffic load, disturbance rates, and flow-level dynamics. The process depends solely on its instantaneous observations and uses this approximate value to change its uplink/downlink switching point.

B. RULE-BASED APPROACHES
One of the common slot allocation (uplink and downlink) issues for flexible TDD frame architectures in Fig. 4 is to select the best-optimized configuration, especially in multicell systems. Using the Lyapunov optimization, a complex mode selection and resource allocation scheme are developed to address the challenging interference characteristics. The proposed dynamic algorithm provides a solution for selecting and allocating uplink and downlink resources, powered by the accumulating queues of network layers and virtual queues. In addition, the problematic power constraint in the slot allocation subproblem can be decoupled by a heuristic approach [54]. Another heuristic approach is also studied by Lukowa et al. [55], where they proposed stand-alone scheduling of semi-static synchronization for interruption sensitivity in a robust TDD system. The plan is to separate the scheduling systems using interference sensors and Successive Interference Cancellation (SIC) receivers to minimize cross-link intrusion. Besides, a new partly distributed scheduling scheme is proposed based on the limited exchange of information in terms of packet delays. The outcomes of the simulations are separate schemes benefit significantly from semi-static slot coordination and achieve high median performance in uplink and downlink.
Saraereh et al. [56] proposed an algorithm to increase wireless spectrum efficiency in the D2D dense network, which allows multiple D2D users to share one cellular user's channel bandwidth. It also maximizes the connectivity service quality of cellular users and D2D users by increasing the Signal-to-interference-plus-noise ratio (SINR) threshold. Depending on the functionality of the deployed device to optimize the system's total power objective feature, a D2D user and channel preferences list are created. Ultimately, to maximize the total capacity of D2D users, the authors use the Kuhn-Munkres (KM) algorithm to achieve the optimum match between D2D clusters and cell channels. Other than that, resource allocation between uplink and downlink are strongly linked to cross-interference constraints, as shown in Fig. 5. Therefore, an algorithm that decomposes the problem into separate sub-problems was proposed by Sapountzis et al. [57], resulting in an efficient framework. The algorithm is associated with small cell users to optimize the system or selected network-centric output parameters such as spectral efficiency and load-balancing. The TDD uplink/downlink configuration is chosen for each cell to find the best match of the uplink/downlink traffic demand for that metric. The algorithm considers the TDD uplink/downlink configuration of nearby base stations to avoid cross-interference. Table 3 summarizes related works on machine learning and algorithm-based techniques for resource allocation with their key method uses and contributions.

III. ENERGY-EFFICIENCY A. MACHINE LEARNING APPROACHES
Energy efficiency in the communication system is usually defined as the number of bits that can be sent over a unit of power consumption, usually quantified by bits per Joule. Therefore, the determining factor of energy efficiency for devices is the power needed to transmit data. With a recent 5G NR approach, the network is capable of supporting a massive amount of devices at a time. However, with a high-density network such in UDN, energy will be one of the important resources that need to be managed as 5G NR targeted to have efficient data transmission in a loaded case and low energy consumption. To effectively improve energy efficiency, a study by Liang et al. [58] proposed an energy efficiency resource allocation scheme based on game theory clustering. The proposed scheme has two stages: clustering and allocation of resources. They use a modified K-means algorithm in the base station clustering cycle in the clustering stage to dynamically change the number of base station clusters based on their density. Other than that, the key objective of wireless technology in the 5G is to support services with significantly different criteria, including adequate numerology and frame structure for allocating radio resources. Therefore, in the second stage, an adaptive 2-dimensional resource allocation improves the energy efficiency of heterogeneous latency requirements for service transmissions. The suggested sliding window (SW) scalability algorithm [59] is used to leverage resource grid frequency and time diversities, frequency -selective network distribution, together with agile ''on-off'' power amplifier (PA) operation. A centralized resource allocation scheme using online learning is proposed by AlQerm et al. [61] which maximizes energy efficiency while retaining QoS specifications for all users. Fostering the efficiency using a scheme by model-free learning to improve the learning process, they consider users' preferences in resource block allocation and compact state-representation-based learning techniques. To tackle the real-time resource management problems of machine learning approaches with cloud computing networks, Din et al. [62] introduced a hierarchical device model for Mobile Cloud Computing focusing on a novel design of the 5G systems. This hierarchical device design is divided into three phases which are foglet layer, operation layer, and coordination layer. These layers can benefit the system architecture by providing efficient resource sharing and assists via cloud services.
For uplink problems of a heterogeneous network hybrid with femtocells overlaid on a macrocell, a process formulated by using a theoretical framework was introduced by Munir et al. [63] with two-layer games to maximize energy efficiency while optimizing network resources. The outer layer allows the increment of the femtocell access point (FAP) to optimize its users' data rate by choosing the frequency band from either the sub-6 GHz or the mmWave. The approach to this non-cooperative game can be found through a pure Nash equilibrium strategy. Using a dual decomposition strategy, the inner layer guarantees the energy efficiency user association method according to the minimum rate and maximum transmission power constraints.
For cloud RANs, further enhancement is needed to reduce power consumption and cope with the demands of wireless users over a long period of operation, which can also be achieved with deep reinforcement learning (DRL). Xu et al. [64] present a novel DRL-based framework for power-efficient resource allocation in cloud RANs. The DRL agent's state space is used as the system's behavior space and compensation function in this work. Other than that, Deep Neural Network (DNN) is used to approximate the action-value structure and formally formulate the resource allocation question as a convex optimization problem.

B. RULE-BASED APPROACHES
Rule-based approaches are also used for the energy efficiency issues, such as that introduced by Hao et al. [65]. They introduced an algorithm of modified power control (MPC) with low complexity and high flexibility for the subject matter. The main goal is to dynamically adjust the power control step size so that the transmit power can converge rapidly into the targeted performance. It will also set a threshold to eliminate the infinite problems in the adjustment process, and the power oscillates around the target. The quick start helps the MPC change the intervening cell's transmit power faster than other algorithms. Table 4 summarizes related works on machine learning and algorithm-based techniques for energy efficiency with their focus, contributions, and key method used.
For the LTE-A network with relay nodes in TDD mode, Load-Based Power Saving (LBPS) and Virtual Time can be incorporated in the architecture for energy savings. This project was proposed by Yang et al. [66], where it integrates sleep timing systems for relay nodes and consumer devices below the base station, such as in Fig. 6. Other than that, three top-down (TD) schemes, namely TD-Aggr, TD-Split, and TD-Merge, and three bottom-up (BU) were compared. The load and channel capacity on the backhaul network is used to assess the sleep pattern for all relay nodes in the proposed work. Then, the sleep period for the user equipment (UE) is appropriately determined under each relay node. From their result, BU-Split is the best power-saving scheme but requires higher computational effort. TD-Merge and BU-Aggr will offer a better balance between the overhead operation and output. There is also an energy efficiency resource management algorithm introduced by Zhang et al. [68], which uses the power consumption of the heterogeneous cloud radio access networks model. The average energy efficiency of the entire network is adopted as the target of optimization with multiple constraints of each user's maximum transmit power, average power, and minimum data rate [68]. Another energy efficiency-related work that uses a low power algorithm is formulated using a non-convex optimization, which considers the statistical channel state information at the transmitter and QoS constraints [69]. To achieve a balance between system performance and computational complexity, sub-optimal power allocation and user scheduling with low computational complexity can minimize total power consumption. The design exploited the heterogeneity of the QoS requirement to determine the successive interference cancellation decoding order. They have achieved a close-to-optimal performance and significantly outperformed the conventional orthogonal multiple access scheme. Furthermore, the results also showed the effectiveness of their proposed scheme in exploiting the QoS heterogeneity to reduce power consumption.
In addition to that, a study by Wang et al. introduced an iterative auction algorithm to improve the energy efficiency of UE [67]. The D2D applications are called bidders, bidding for channel services, and the cellular network is regarded as the auctioneer. This method provides substantial benefits to resource utilization, improving user throughput and extending UE battery life.

IV. INTERFERENCE MITIGATION A. MACHINE LEARNING APPROACHES
5G wireless networks are expected to be a mixture of network tiers of different sizes, transmit powers, backhaul connections, different radio access technologies (RATs) that are accessed by an unprecedented number of smart and heterogeneous wireless devices. This architectural enhancement and advanced physical communications technology, such as high-order spatial multiplexing of MIMO communications, will result in a multi-tiered system that further complicates the work in mitigating interference. A system uses an online learning algorithm for efficient resource allocation to solve cross-level interference that impacts macro users. It can adapt to the control and modulation is proposed by Al Qerm et al. The evaluation results show that their online scheme outperforms others and significantly improves the network performance [70].
One of the interference-related studies by Aihara et al. focused on reducing the interference related to wireless nodes covering a large area [71]. This work ensures that the node can identify the ongoing communication of other nodes in a wide area long-range network (LoRaWAN), resulting in better packet transmission. The wireless ecosystem around LoRaWAN nodes is equipped with the Q-Learning technique, and the training is used for resource allocation to improve packet delivery performance. Q-Learning that is applied in the LoRaWAN determines the priority of the packets effectively, which is called a Q-reward [71]. This shows that better performance effectively by improving the SINR rate alleviating interference issues.
Other than that, a study by Deb et al. [72] proposed a measurement-driven machine learning model for power control in 4G LTE device to handle uplink interference called LeAP. The data-driven approach has an inherent advantage in which the solution adapts based on the operation, distribution, and topology of the network, which becomes increasingly heterogeneous with multiple cell overlays. LeAP system design consists of two main components: first, the design of UE measurement statistics that are concise but descriptive enough to capture network dynamics. Second is the design of the learning model using the recorded measurements to establish power control parameters and maximize network performance.
Using deep neural network (DNN) for interference mitigation technique, Haorun et al. [73] proposed a new approach that considers the input and output of a signal processing algorithm as an undefined nonlinear mapping and estimates it. DNN is a powerful category of machine learning model implemented by stacking layers of neural networks along the depth and width of smaller architectures. DNN has recently demonstrated discriminative and representation learning capabilities over a wide range of applications in the contemporary years [74]. If the nonlinear mapping can be mastered correctly by a moderate size DNN, signal processing tasks can be performed effectively. This is because moving the input through a DNN requires only a small number of simple operations. Then, a class of optimization algorithms is defined so that it can be approximated correctly using a fully connected DNN and compared to the weighted minimum mean square error (WMMSE) model.
Other than that, Li et al. [75], establish an apprenticeshipbased power control method to share the traditional spectrum with the primary consumer without causing harmful interference. An assumption was made where the primary and secondary users work in a non-cooperative fashion. A set of sensor nodes are stationed spatially to assist the secondary user in collecting the received signal strength information at various wireless sensor networks. They created a DRL-based method that is used by the secondary user to smartly modify its transmit power so that both users can effectively send their own data with the necessary service qualities.
Another system that can benefit from DRL implementation in mitigating interference is Unmanned aerial vehicles (UAV). UAV is a class of aircraft that can fly without the onboard presence of pilots [76]. Unmanned aircraft systems consist of the aircraft component, sensor payloads, and a ground control station which communicate with the various access point, as shown in Fig. 7. They can be controlled by onboard electronic equipment or via control equipment from the ground. When it is remotely controlled from the ground, it is called Remotely Piloted Vehicle (RPV) and requires reliable and fast response communication support operation [77]. Therefore, achieving a trade-off between maximizing energy efficiency and minimizing buffering in mitigating interference along its path on the ground network is important. A DRL model focused on cells from the echo state network (ESN) is proposed to achieve this trade-off problem [78]. The applied deep ESN architecture is trained to enable each UAV to map each network state observation to an operation, attempting to reduce a series of time-dependent utility functions. Each UAV uses the ESN to know its optimum path, transmission power, and vector of cell association at various locations along its path. One of the biggest challenges in interference management is the deployment of femtocells in the macrocell range, which can degrade the macrocell performance due to overlapping cells, as shown in Fig. 8. Therefore, Essayed et al. [80] worked on interference management in distributed femtocell networks by using reinforcement learning in Media Access Control (MAC) protocol. This approach uses the Q-Learning model to perform a distributed power allocation in a GNU Radio and Universal Software Radio Peripheral (USRP) platform. Both autonomous and cooperative learning methods are applied across network nodes. The Q-Learning model aimed to maximize femtocell aggregate potential while preserving the QoS for the macrocell users. Other than that, the study of the uplink interference issue in ultra-dense heterogeneous networks is done by introducing a proposed hybrid game method for managing interference between macrocell and remote radio heads by Gu et al. [81]. Unlike other interference mitigation approaches, this proposed method will also preserve UEs' spectrum efficiency and energy usage. Furthermore, the technique also guaranteed that UEs only use local information and limited interaction to optimize their usefulness. Thus, the sharing of power information between UEs is unnecessary, making it more suitable for the complex 5G scenarios.

B. RULE-BASED APPROACHES
Enhanced Interference Mitigation and Traffic Adaptation (eIMTA) is one of the features which enable TD-LTE systems to handle asymmetric and dynamic uplink and downlink capacity requirements [82]. eIMTA allows dynamic changes to configurable TDD patterns for uplink and downlink. eIMTA in dynamic TDD appears to be an attractive solution considering the typically small number of simultaneously active UEs in a small cell TDD uplink/downlink subframe configuration. Thereby, it could greatly improve user experience, especially in low to medium load [83].
Not limited to eIMTA, Guo et al. proposed new interference suppression schemes using advanced receivers, such as a minimum mean square error interference rejection cancellation (MMSE-IRC) receiver and an enhanced MMSE-IRC (eMMSE-IRC) receiver [20]. In its simplest form, the receiver will treat the interference as noise and suppress it linearly. These devices are commonly known as Interference Suppression (IS) receivers, an example of which is the Minimum Mean Square Error-Interference Rejection Combining receiver [84]. In this work, Raspberry Pi was used as a server by linking several clients to the same network. The research was performed in three cases where there is contact with a client-server using ZigBee. In power-based interference mitigation, cross-link interference is mitigated by reducing the downlink transmission power in flexible downlink subframes that potentially cause eNB-to-eNB interference. It also boosted the uplink transmission power in flexible uplink subframes that are potentially affected by eNB-to-eNB interference. As a result, the system has improved the SINR and increased the uplink throughput gain with lower gap performance. As stated before, the UDN user needs a specific QoS and requirement to be fulfilled. One of the solutions to improve SINR in UDN is by employing beam steering on the transmitter and receiver sides. However, a trade-off point between throughput and antenna directivity to minimize intercellular interference is needed. Therefore, Celik et al. [85] introduced a scheduling model that compares base station to base station interference, calculated by considering traffic and transmission environment. In their method, zero-forcing (ZF) precoding is used in the UEs and base stations for both uplink and downlink transmission. However, the downside of ZF precoding is that the intrusion is dependent on the offline calculation of base station to base station radio propagation. Therefore, non-cooperative dynamic TDD limits the performance and requires additional support to avoid interference.
In order to achieve transmission efficiency advantage, it is necessary to deal with the effects of cross-linking interference generated between cells transmission in opposite directions. Therefore, a beamforming bi-directional sumpower minimization algorithm was proposed to resolve this cross-link interference by shifting both uplink and downlink to a minimal SINR ratio [86]. The authors suggested two iterative approaches to address the beamforming issue. The first solution entailed structured coordination and required the availability of information on the global channel situation. The second approach is carried out in a simplified fashion, based on the alternating path system of the multipliers, which required only local channel state information and decreased signal load. Both methods converged to the same solution, and the decentralized one can estimate the optimal solution with decreased iterations. Thus, greater interference in dynamic TDD can be avoided with low power usage since their SINR specifications can be assured to users in downlink and uplink. Table 5 summarizes related works on machine learning and algorithm-based techniques for interference mitigation with their focus, contributions, and key method used.
Lee et al. [90] have addressed the problem of the interference signals generated by the coexistence of uplink and downlink transmission, which can be helpful for the UDN scenario. They have developed the Zero Forcing Beamforming (ZFBF) for uplink and downlink transmission to minimize the mean rate loss between perfect channel knowledge and limited feedback. The authors also suggested an uplink transmission control allocation scheme to the downlink consumer under the interference resource restriction. The authors used an iterative scheme to refine the interference power threshold to improve the downlink data rate to address the distribution of input bits and the uplink relay resource allocation problems.
Based on the work in Seng et al. [91], the problems of Hierarchical TDD interference in UDNs is solved by implementing a clustering approach based on the Chameleon algorithms. With this clustering technique, the performance parameters of each cluster are the same. The Dynamic Resource Allocation (DRA) is implemented in small cells to meet user demand criteria such as QoS, data rate, and delay by deploying less network equipment. The authors eventually implemented multicell beamforming (MBF) in small cells with the same frame configuration in each cluster to minimize intercellular interference (ICI) further.
Another study by Celik et al. [88] proposed a scheduler using an offline base station to base station measurements to measure real interference with different traffic conditions to reduce signaling for the channel measurement and input. Hence, once the network is activated, no CSI is required. The authors also demonstrated that signaling could be further minimized, considering only relevant interferers above the obtained power level for exchanging traffic information.
Also, a function was introduced to do the scheduling, which converts interference into individual base station activation probabilities. The results indicate that the scheduling method effectively handles interference for close-by links within very dense networks compared to a power control technique.
To achieve consistency in coordinated scheduling, Lukowa et al. [89], [92] proposed a cluster-based scheduling scheme where the mutual collection of uplink/downlink and the cooperative scheduling of users within a cluster is carried out. MIMO rank and rate changes were made across clusters using inter-cluster rates, whereas the coordinated uplink/downlink synchronization is scheduled within clusters utilizing interference-consciously efficient TDD switching. Successively paired rank allocation (SPARK) is used for inter-cluster rank contact. The implementation of this algorithm allows improved interference mitigation with reduction of packet delay as shown in Fig 9. To improve downlink network sum-rate, a work by Muta et al. [87] proposed an uplink pilot allocation scheme managing downlink cross-tier interference with small cell users based on estimated uplink channel state data. In this approach, the optimal pilot distribution is between two depletion factors: uplink pilot overhead and downlink cross-tier interference. Other than that, the authors presented a dynamic small cell base station clustering scheme for minimizing dominant co-tier interference among small cells, where the clustering is done based on possible mutual co-tier interference intensity between two cells. In each cluster, the small base station precoder architecture is used to boost further downlink sum performance of cells under small cell base station power constraint.

V. ISSUES AND CHALLENGES
Firstly, dynamic prediction in symbol level allocation of uplink and downlink implementation requires large processing power to support concise and accurate prediction to efficiently manage the process. One of the promising solutions is by MEC, which considers this system to be an important future service for realizing 5G networks and the Internet of Things. It provides the best method for computation and communication resources to mobile devices. It is based on the user connection to servers situated on the edge of the network, especially for real-time applications requiring minimal delay [26].
MEC users will remain in the coverage area of the MEC service provider for a limited time, making the users' demands diverse. The different types of users come and go during the day and request a variety of services. The services are also changing rapidly to suit user requirements. Therefore, there is a need to develop services rapidly and cost-effectively. It is challenging to accommodate this requirement due to the varying nature of emerging services. VOLUME 9, 2021 To provide efficient energy usage to the 5G system, small cells are currently deployed to overcome problems related to the short transmission range. With more base stations in an area, it will lead to higher energy use for infrastructure maintenance. The transmission of a wireless signal consumes high energy, especially on power amplifiers and RF chains to convert the signal between the baseband signal and the wireless radar signals [93].
The energy consumption of feeders is also included in the transmission power. Other than that, there is also computing power, including digital signal processing features, base station management and control functions, and core network and base station communications functions. Third, additional power is used to support the power lost while exchanging the power grid to the main supply. One scenario is exchanging various power supplies for the DC-DC and power used for an active cooling system at base stations. This is one of the critical issues that the 5G NR program frequently overlooks. Worse still, these small cells are expected to operate via SDN and network functions virtualization (NFV) where powered infrastructures are needed to support future demand. A solution is therefore critical for reducing the energy consumption of small cell base stations. Other than that, security issues on MEC also worth exploring, where the 5G NR system security needs to be better-protected [94]. One of the issues is installing a system isolated from the MEC server mobile device. This isolated system is open to intruders when transferring the file on the wireless network. Other than that, security concerns arise from the use of the same physical resources by different users. Secure application transfer is possible through encryption. However, application code encryption and decryption will add additional delays during application execution, degrading the application performance.
The problems of fairness in the 5G network should also be addressed. As stated by Ming et al. [95], it is optimal to have approximately equal load on separate carrier components for maximum system performance. Unbalanced traffic through carrier components would result in the underuse of spectrum resources. The macrocell and the small cell adjust their spectrum proportion dynamically or transport the load from a small cell to the macro cell when the load between them is unbalanced [95]. In a nutshell, efficient management is crucial in deploying 5G NR, or else it will result in a significant QoS degradation and waste of resources. Table 6 summarizes the issues and challenges derived for future 5G system improvement

VI. CONCLUSION
Realizing a technology with the vast growth of connected devices, data demands, and QoS are a major challenge in 5G. This paper provides a detailed survey of radio resource management for 5G resource allocation, energy efficiency, and interference mitigation using machine learning and rulebased approaches. The focus, contributions, and critical features of each work are highlighted. This paper also highlights the challenges and future direction of 5G resource management. MEC implementation is a potential method to provide the requirement set for 5G, but security and power efficient deployment issues have to be addressed. In addition, providing high SINR and QoS to the end-user will need a reliable interference mitigation solution that covers multiple layers of the system. It is expected that this comprehensive survey will aid researchers in coming up with ideas to address essential issues in further improve 5G resource management.