Abstract:
Recent advancements in robot learning leverage large language models (LLMs) and sampling-based task and motion planning (TAMP) modules for automatic and scalable robot da...Show MoreMetadata
Abstract:
Recent advancements in robot learning leverage large language models (LLMs) and sampling-based task and motion planning (TAMP) modules for automatic and scalable robot data generation. This method yields both success trajectories and a large number of failure trajectories. Prior works typically filter out failure data and adopt behavior cloning (BC) to train policy. However, this significantly reduces the sample efficiency of the method and results in a policy limited by the data collection behavior policy. In this paper, we introduce a vision-language-conditioned action-sequence diffusion policy and an action-sequence diffusion policy learning with Q-guided refinement for its training. We first redefine the reverse process of the diffusion model as the distribution of action sequences conditioned on visual observations and language instructions. We then employ BC to pretrain the policy on the success sub-dataset. Next, we optimize the action-sequence Q-value function by minimizing the temporal difference error across the complete dataset. Finally, we integrate guidance from the Q-value function into the BC loss of the reverse diffusion chain. Our method significantly outperforms baseline methods in terms of success rate and sample efficiency. By effectively leveraging failure data to optimize the policy, our method can achieve results comparable to those trained with the complete success sub-dataset while requiring 20%-30% less success data.
Published in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information: