Skip to Main Content
Automatic concept segmentation and labeling are the fundamental problems of spoken language understanding in dialog systems. Such tasks are usually approached by using generative or discriminative models based on n-grams. As the uncertainty or ambiguity of the spoken input to dialog system increase, we expect to need dependencies beyond n-gram statistics. In this paper, a general purpose statistical syntactic parser is used to detect syntactic/semantic dependencies between concepts in order to increase the accuracy of sentence segmentation and concept labeling. The main novelty of the approach is the use of new tree kernel functions which encode syntactic/semantic structures in discriminative learning models. We experimented with support vector machines and the above kernels on the standard ATIS dataset. The proposed algorithm automatically parses natural language text with off-the-shelf statistical parser and labels the syntactic (sub)trees with concept labels. The results show that the proposed model is very accurate and competitive with respect to state-of-the-art models when combined with n-gram based models.