Learning Locomotion for Quadruped Robots via Distributional Ensemble Actor-Critic | IEEE Journals & Magazine | IEEE Xplore

Learning Locomotion for Quadruped Robots via Distributional Ensemble Actor-Critic


Abstract:

Domain randomization introduces perturbations in the simulation to make controllers less susceptible to the reality gap, which enables remarkable sim-to-real transfer on ...Show More

Abstract:

Domain randomization introduces perturbations in the simulation to make controllers less susceptible to the reality gap, which enables remarkable sim-to-real transfer on real quadruped robots. However, aleatoric uncertainty originating from perturbations could often lead to suboptimal controllers. In this work, we present a novel algorithm called Distributional Ensemble Actor-Critic (DEAC) that blends three ideas: distributional representation of a critic, lower bounds of the value distribution, and ensembling of multiple critics and actors. Distributional representation and ensembling provide reasonable uncertainty estimates, while lower bounds of the value distribution offer finer-grained error control. The simulation results show that the controller trained by DEAC outperforms the other baselines in the domain randomization setting. The trained controller is deployed on an A1-like robot, demonstrating high-speed running and the ability to traverse diverse terrains such as slippery plates, grassland, and wet dirt.
Published in: IEEE Robotics and Automation Letters ( Volume: 9, Issue: 2, February 2024)
Page(s): 1811 - 1818
Date of Publication: 04 January 2024

ISSN Information:

Funding Agency:


References

References is not available for this document.