Abstract:
Dialogue policy learning is the core decision-making module of a task-oriented dialogue system. Its primary objective is to assist users to achieve their goals effectivel...Show MoreMetadata
Abstract:
Dialogue policy learning is the core decision-making module of a task-oriented dialogue system. Its primary objective is to assist users to achieve their goals effectively in as few turns as possible. A practical dialogue-policy agent must be able to expand its knowledge to handle new scenarios efficiently without affecting its performance. Nevertheless, when adapting to new tasks, existing dialogue-policy agents often fail to retain their existing (old) knowledge. To overcome this predicament, we propose a novel continual dialogue-policy model which tackles the issues of “not forgetting the old” and “acquiring the new” from three different aspects: (1) For effective old-task preservation, we introduce the forgetting preventor which uses a behavior cloning technique to force the agent to take actions consistent with the replayed experience to retain the policy trained on historic tasks. (2) For new-task acquisition, we introduce the adaption accelerator which employs an invariant risk minimization mechanism to produce a stable policy predictor to avoid spurious corrections in training data. (3) For reducing the storage cost of the replayed experience, we introduce a replay manager which helps regularly clean up the old data. The effectiveness of the proposed model is evaluated both theoretically and experimentally and demonstrated favorable results.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 36, Issue: 12, December 2024)