Skip to Main Content
Reinforcement learning (RL) applications in robotics are of great interest because of their wide applicability, however many RL applications suffer from large learning costs. We study a new learning-walking scheme where a humanoid robot is embedded with a primitive balancing controller for safety. In this paper, we investigate some RL methods for the walking task. The system has two modes: double stance and single stance, and the selectable action spaces (sub-action spaces) change according to the mode. Thus, a hierarchical RL and a function approximator (FA) approaches are compared in simulation. To handle the sub-action spaces, we introduce the structured FA. The results demonstrate that non-hierarchical RL algorithms with the structured FA is much faster than the hierarchical RL algorithm. The robot can obtain appropriate walking gaits in around 30 episodes (20~30 min), which is considered to be applicable to a real humanoid robot.