Loading [a11y]/accessibility-menu.js
Structure-Aware Policy to Improve Generalization among Various Robots and Environments | IEEE Conference Publication | IEEE Xplore

Structure-Aware Policy to Improve Generalization among Various Robots and Environments


Abstract:

Recently, Deep Reinforcement Learning (DRL) has been used to solve complex robot control tasks with outstanding success. However, previous DRL methods still exist some sh...Show More

Abstract:

Recently, Deep Reinforcement Learning (DRL) has been used to solve complex robot control tasks with outstanding success. However, previous DRL methods still exist some shortcomings, such as poor generalization performance, which makes policy performance quite sensitive to small vari-ations of the task settings. Besides, it is quite time-consuming and computationally expensive to retrain a new policy from scratch for new tasks, hence restricts the applications of DRL-based methods in the real world. In this work, we propose a novel DRL generalization method called GNN-embedding, which incorporates the robot hardware and the environment simultaneously with GNN-based policy network and learnable embedding vectors of tasks. Thus, it can learn a unified policy for different robots under different environment conditions, which improves the generalization performance of existing DRL robot policies. Multiple experiments on the Hopper-v2 robot are conducted. The experimental results demonstrate the effectiveness and efficiency of GNN-embedding on generalization, including multi-task learning and transfer learning problems.
Date of Conference: 05-09 December 2022
Date Added to IEEE Xplore: 18 January 2023
ISBN Information:
Conference Location: Jinghong, China

Funding Agency:

References is not available for this document.

I. Introduction

With the growing interest and development in deep rein-forcement learning (DRL), we have seen remarkable progresses and achievements of DRL in various application scenarios, including video games [1], autonomous vehicles [2], and robotics control [3]–[5]. Despite the huge success, existing DRL-based methods are still quite limited on generalization [6], which restricts real applications of DRL methods, especially in robot control tasks. The policy highly depends on the parameter settings of the task, thus it can only learns how to control a single robot in a single environment at one time. For example, given robot control tasks with different robot hardware implementations (link length, etc.) and environment features (friction coefficient, etc.), we are required to train multiple policies, despite the similarity among these tasks, which is quite computationally expensive and time-consuming. Therefore, it is essential to design a unified DRL method, which can be utilized across various robots and environments simultaneously.

The visualization of the generalization task and GNN -embedding composition. The task pool k contains robots with different hardware parameters and environments with various characteristics. Each task is a combination of a robot and an environment. Gnn -embedding is a unified policy to control multiple tasks, which is able to learn both the task-invariant knowledge and the task-specific knowledge.

Select All
1.
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
2.
B. R. Kiran, I. Sobh, V. Talpaert, P. Mannion, A. A. Al Sallab, S. Yo-gamani, and P. Perez, “Deep reinforcement learning for autonomous driving: A survey,” IEEE Transactions on Intelligent Transportation Systems, 2021.
3.
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning. PMLR, 2015, pp. 1889–1897.
4.
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” CoRR, vol. abs/1707.06347, 2017. [Online]. Available: http://arxiv.org/absJ1707.06347.
5.
W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: a survey,” in 2020 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE, 2020, pp. 737–744.
6.
M. E. Taylor and P. Stone, “Transfer learning for reinforcement learning domains: A survey.” Journal of Machine Learning Research, vol. 10, no. 7, 2009.
7.
T. Chen, A. Murali, and A. Gupta, “Hardware conditioned policies for multi-robot transfer learning,” Advances in Neural Information Processing Systems, vol. 31, 2018.
8.
T. Wang, R. Liao, J. Ba, and S. Fidler, “Nervenet: Learning structured policy with graph neural networks,” in International conference on learning representations, 2018.
9.
W. Huang, I. Mordatch, and D. Pathak, “One policy to control them all: Shared modular policies for agent-agnostic control,” in International Conference on Machine Learning. PMLR, 2020, pp. 4455–4464.
10.
T. Erez, Y. Tassa, and E. Todorov, “Infinite-horizon model predictive control for periodic tasks with contacts,” Robotics: Science and systems VII, vol. 73, 2012.
11.
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 2018.
12.
C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning modular neural network policies for multi-task and multi-robot transfer,” in 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 2017, pp. 2169–2176.
13.
L. Pinto and A. Gupta, “Learning to push by grasping: Using multiple tasks for effective learning,” in 2017 IEEE international conference on robotics and automation (ICRA). IEEE, 2017, pp. 2161–2168.
14.
A. Rajeswaran, S. Ghotra, S. Levine, and B. Ravindran, “Epopt: Learning robust neural network policies using model ensembles,” CoRR, vol. abs/1610.01283, 2016. [Online]. Available: http://arxiv.org/abs/1610.01283.
15.
M. Gori, G. Monfardini, and F. Scarselli, “A new model for learning in graph domains,” in Proceedings. 2005 IEEE international joint conference on neural networks, vol. 2, no. 2005, 2005, pp. 729–734.
16.
W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Advances in neural information processing systems, vol. 30, 2017.
17.
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” stat, vol. 1050, p. 20, 2017.
18.
J. Bastings, I. Titov, W. Aziz, D. Marcheggiani, and K. Simaan, “Graph convolutional encoders for syntax-aware neural machine trans-lation,” arXiv preprint arXiv: 1704.04675, 2017.
19.
R. Ying, R. He, K. Chen, P. Eksombatchai, W. L. Hamilton, and J. Leskovec, “Graph convolutional neural networks for web-scale recommender systems,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 974–983.
20.
J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl, “Neural message passing for quantum chemistry,” in International conference on machine learning. PMLR, 2017, pp. 1263–1272.
21.
A. Sanchez-Gonzalez, N. Heess, J. T. Springenberg, J. Merel, M. Ried-miller, R. Hadsell, and P. Battaglia, “Graph networks as learnable physics engines for inference and control,” in International Conference on Machine Learning. PMLR, 2018, pp. 4470–4479.
22.
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schul-man, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv: 1606.01540, 2016.
23.
E. Todorov, T. Erez, and Y. Tassa, “Mujoco: A physics engine for model-based control,” in 2012 IEEE/RSJ international conference on intelligent robots and systems. IEEE, 2012, pp. 5026–5033.

Contact IEEE to Subscribe

References

References is not available for this document.