By Topic

Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Meng, H.M. ; Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong, Shatin, China ; Kai-Chung Siu

This paper describes a methodology for semiautomatic grammar induction from unannotated corpora of information-seeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words "spatially". These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words "temporally". These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the ATIS (Air Travel Information Service) corpus and the semiautomatically-induced grammar GSA is compared to an entirely handcrafted grammar GH. GH took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. GSA took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort

Published in:

Knowledge and Data Engineering, IEEE Transactions on  (Volume:14 ,  Issue: 1 )