Skip to Main Content
This study introduces a method to enable a robot to learn how to perform new tasks through human demonstration and independent practice. The proposed process consists of two interconnected phases; in the first phase, state-action data are obtained from human demonstrations, and an aggregated state space is learned in terms of a decision tree that groups similar states together through reinforcement learning. Without the postprocess of trimming, in tree induction, the tree encodes a control policy that can be used to control the robot by means of repeatedly improving itself. Once a variety of behaviors is learned, more elaborate behaviors can be generated by selectively organizing several behaviors using another Q-learning algorithm. The composed outputs of the organized basic behaviors on the motor level are weighted using the policy learned through Q-learning. This approach uses three diverse Q-learning algorithms to learn complex behaviors. The experimental results show that the learned complicated behaviors, organized according to individual basic behaviors by the three Q-learning algorithms on different levels, can function more adaptively in a dynamic environment.