Skip to Main Content
Recent research has shown that effective dialogue management can be achieved through the Partially Observable Markov Decision Process (POMDP) framework. However past research on POMDP-based dialogue systems usually assumed the parameters of the decision process were known a priori. The main contribution of this paper is to present a Bayesian reinforcement learning framework for learning the POMDP parameters online from data, in a decision-theoretic manner. We discuss various approximations and assumptions which can be leveraged to ensure computational tractability, and apply these techniques to learning observation models for several simulated spoken dialogue domains.