Skip to Main Content
Managing multimodal interactions between humans and computer systems requires a combination of state estimation based on multiple observation streams, and optimisation of time-dependent action selection. Previous work using partially observable Markov decision processes (POMDPs) for multimodal interaction has focused on simple turn-based systems. However, state persistence and implicit state transitions are frequent in real-world multimodal interactions. These phenomena cannot be fully modelled using turn-based systems, where the timing of system actions is a non-trivial issue. In addition, in prior work the POMDP parameterisation has been either hand-coded or learned from labelled data, which requires significant domain-specific knowledge and is labor-consuming. We therefore propose a nonparametric Bayesian method to automatically infer the (distributional) representations of POMDP states for multimodal interactive systems, without using any domain knowledge. We develop an extended version of the infinite POMDP method, to better address state persistence, implicit transition, and timing issues observed in real data. The main contribution is a “sticky” infinite POMDP model that is biased towards self-transitions. The performance of the proposed unsupervised approach is evaluated based on both artificially synthesised data and a manually transcribed and annotated human-human interaction corpus. We show statistically significant improvements (e.g. in ability of the planner to recall human bartender actions) over a supervised POMDP method.