Skip to Main Content
In this paper, a multi-stage spoken language understanding system is presented. This stochastic module is for the first time based on a combination of dynamic Bayesian networks and conditional random field classifiers. The former generative models allow to derive basic concept sequences from the word sequences which are in turn augmented with modalities and hierarchical information by the latter discriminative models. To provide efficiently smoothed conditional probability estimates, factored language models with a generalized parallel backoff procedure are used as the network edge implementation. This framework allows a great flexibility in terms of probability representation facilitating the development of the stochastic levels (semantic and lexical) of the system. Experiments are carried out on the French MEDIA task (tourist information and hotel booking). The MEDIA 10k-utterance training corpus is conceptually rich (more than 80 basic concepts) and is provided with a manually segmented annotation. On this complex task, the proposed multi-stage system is shown to offer better performance than the MEDIA'05 evaluation campaign best system (H. Bonneau-Maynard et al., 2006).