Transferring Meta-Policy From Simulation to Reality via Progressive Neural Network | IEEE Journals & Magazine | IEEE Xplore

Transferring Meta-Policy From Simulation to Reality via Progressive Neural Network


Abstract:

Deep reinforcement learning has achieved great success in many challenging domains. However, sample efficiency and safety issues still prevent from applying deep reinforc...Show More

Abstract:

Deep reinforcement learning has achieved great success in many challenging domains. However, sample efficiency and safety issues still prevent from applying deep reinforcement learning directly in robotics. Sim-to-real transfer learning is one feasible solution to tackle these problems and address the reality gap between simulation and reality. In this letter, we propose to combine meta-reinforcement learning and progressive neural network (PNN) by meta-training a policy for multiple source tasks and transferring it to the real-world robot via PNN (MetaPNN). We expect that training meta-policy over meta-tasks without considering dynamics discrepancy with our method can bridge the gap between simulation and reality with mismatched dynamics, and allow the agent to learn one single policy solving multiple tasks instead of using one policy network in PNN to solve one task. Meanwhile, the transferred meta-policy via PNN is expected to solve the target task and adapt to new situations at the same time. Our results in a variety of target tasks in AntPos and Reach with simulated manipulator show that MetaPNN can significantly improve the robot's learning efficiency and performance. Our further results in real-world Reach tasks with physical robot arm and a new task that is different from the meta-tasks show there might be a synergy between meta-learning and PNN.
Published in: IEEE Robotics and Automation Letters ( Volume: 9, Issue: 4, April 2024)
Page(s): 3696 - 3703
Date of Publication: 27 February 2024

ISSN Information:

Funding Agency:


I. Introduction

Reinforcement learning (RL) aims to obtain optimal policies that maximize the expected accumulated rewards by interacting with the environment via trial and error [1]. Recently, due to advances of deep learning (DL), deep reinforcement learning has achieved great success and been applied to many challenging problems, such as video gaming [2], Go [3], Watson DeepQA system [4], autonomous driving [5], [6], multiple robot system [7], etc. However, there are usually two main challenges for training a deep reinforcement learning robot directly in the real world [8]. One challenge is that it generally requires millions of samples to learn an optimal policy for a real-world robot, which will take several months to collect since task executions are comparatively expensive and time-consuming in the real world. The second challenge is that a deep reinforcement learning robot may damage itself or living things in the surrounding environment because of explorations via trial and error [9], [10].

Contact IEEE to Subscribe

References

References is not available for this document.