Multimodal grammars provide an expressive formalism for rapid bootstrapping of finite-state mechanisms for multi-modal integration and understanding. These mechanisms align speech and gesture inputs, readily scale to processing of lattice inputs, and enable recovery from speech and gesture recognition errors through mutual compensation. However, in common with other handcrafted mechanisms, they can be brittle with respect to unexpected, erroneous, or disfluent inputs. In this paper, we show how the robustness of stochastic language models can be combined with the expressiveness of multimodal grammars by adding a finite-state edit machine to the multimodal language processing cascade. We evaluate the effectiveness of the approach in a multimodal conversational system (MATCH) which provides restaurant and subway information on a speech and pen enabled mobile device
Published in:
Automatic Speech Recognition and Understanding, 2005 IEEE Workshop on
Date of Conference: 27-27 Nov. 2005