Abstract:
In this paper, we propose practical model-based policy optimization (PMBPO) to address the time efficiency issue caused by overly frequent model updates in recent probabi...Show MoreMetadata
Abstract:
In this paper, we propose practical model-based policy optimization (PMBPO) to address the time efficiency issue caused by overly frequent model updates in recent probabilistic model-based reinforcement learning (MBRL) methods that accelerate learning by generating samples from the model. PMBPO enhances the reliability of the generated samples by introducing an expressive probabilistic model that focuses on the system’s dynamic features over continuous time steps. A time-efficient learning framework is proposed by offline updating and interacting with the model at the end of each epoch. One policy fallback mechanism is further designed to mitigate the negative impact of model bias on the learned policy. Evaluated on five Mujoco control benchmarks and one quadruped robot control scenario, PMBPO reduces the one-step computation time by 90% while achieving 70% more cumulative rewards compared to the state-of-the-art MBRL approaches. It extends the feasibility of MBRL in practical control scenarios. The code of PMBPO is available at https://github.com/mrjun123/PMBPO Note to Practitioners—Recent model-based reinforcement learning methods like MBPO, which efficiently learn tasks by generating samples from an approximated system model, are difficult to apply in practical control scenarios due to extensive system pauses caused by overly frequent model updates. This paper proposes a practical MBPO specific for control problems. It reduces the computational burden of real-time learning while surpassing the recent MBPO approaches in learning efficiency and control capability.
Published in: IEEE Transactions on Automation Science and Engineering ( Volume: 22)