Skip to Main Content
In this paper, we investigate two statistical methods for spoken language understanding based on statistical machine translation. The first approach employs the source-channel paradigm, whereas the other uses the maximum entropy framework. Starting with an annotated corpus, we describe the problem of natural language understanding as a translation from a source sentence to a formal language target sentence. We analyze the quality of different alignment models and feature functions and show that the direct maximum entropy approach outperforms the source channel-based method. Furthermore, we investigate how both methods perform if the input sentences contain speech recognition errors. Finally, we investigate a new approach to combine speech recognition and spoken language understanding. For this purpose, we employ minimum error rate training which directly optimizes the final evaluation criterion. By combining all knowledge sources in a log-linear way, we show that we can decrease both the word error rate and the slot error rate. Experiments were carried out on two German inhouse corpora for spoken dialogue systems.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:17 , Issue: 4 )
Date of Publication: May 2009