Abstract:
Recently, Deep Reinforcement Learning (DRL) has increasingly been used to solve complex problems in mobile networks. There are two main types of DRL models: off-policy an...Show MoreMetadata
Abstract:
Recently, Deep Reinforcement Learning (DRL) has increasingly been used to solve complex problems in mobile networks. There are two main types of DRL models: off-policy and on-policy. Both of them have been shown to have advantages. While off-policy models can improve sample efficiency, on-policy models are generally easy to implement and have stable performance. Therefore, it becomes hard to decide the appropriate model in a given scenario. In this paper, we compare an on-policy model: Proximal Policy Optimization (PPO) with an off-policy model: Sample Efficient Actor-Critic with Experience Replay (ACER) in solving a resource allocation problem for a stringent Quality of Service (QoS) application. Results show that for an Open Radio Access Network (O-RAN) with latency-sensitive and latency-tolerant users, both DRL models outperform a greedy algorithm. We also point out that the on-policy model can guarantee a good trade-off between energy consumption and users latency, while the off-policy model provides a faster convergence.
Date of Conference: 10-13 April 2022
Date Added to IEEE Xplore: 16 May 2022
ISBN Information: