Cart (Loading....) | Create Account
Close category search window

Incremental Sparse Bayesian Method for Online Dialog Strategy Learning

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Sungjin Lee ; Language Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, USA ; Eskenazi, M.

This paper proposes an incremental sparse Bayesian learning method to allow continuous dialog strategy learning from the interactions with real users. Since conventional reinforcement learning (RL) methods require a huge number of dialogs to reach convergence, it has been essential to use a simulated user in training dialog policies. The disadvantage of this approach is that the trained dialog policies always lag behind the optimal one for live users. In order to tackle this problem, a few studies applying online RL methods to dialog management have emerged and showed very promising results. However, these methods are limited to learning online the weight parameters of the basis functions in the model and so need batch learning on a fixed data set or some heuristics to find appropriate values for other meta parameters such as sparsity-controlling thresholds, basis function parameters, and noise parameters. The proposed method attempts to overcome this limitation to achieve fully incremental and fast dialog strategy learning by adopting a sparse Bayesian learning method for value function approximation. In order to verify the proposed method, three different experimental conditions have been used: artificial data, a simulated user, and real users. The experiment on the artificial data showed that the proposed method successfully learns all the parameters in an incremental manner. Also, the experiment on training and evaluating dialog policies with a simulated user clearly demonstrated that the proposed method is much faster than conventional RL methods. A live user study showed that the dialog strategy learned from real users performed as good as the best past systems, although it slightly underperformed the one trained on simulated dialogs due to the difficulty of user feedback elicitation.

Published in:

Selected Topics in Signal Processing, IEEE Journal of  (Volume:6 ,  Issue: 8 )

Date of Publication:

Dec. 2012

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.