A comparison of four metrics for auto-inducing semantic classes
Pargellis, A.
Fosler-Lussier, E.
Potamianos, A.
Chin-Hui Lee
Dialogue Syst. Res. Dept, Lucent Technol. Bell Labs., Murray Hill, NJ, USA;
Abstract
A speech understanding system typically includes a natural language understanding module that defines concepts, i.e., groups of semantically related words. It is a challenge to build a set of concepts for a new domain for which prior knowledge and training data are limited. In our work, concepts are induced automatically from unannotated training data by grouping semantically similar words and phrases together into concept classes. Four context-dependent similarity metrics are proposed and their performance for auto-inducing concepts is evaluated. Two of these metrics are based on the Kullback-Leibler (KL) distance measure, a third is the Manhattan norm, and the fourth is the vector product (VP) similarity measure. The KL and VP metrics consistently underperform the other metrics on the four tasks investigated: movie information, a children's game, travel reservations, and Wall Street Journal news articles. Correct concept classification rates are up to 90% for the movie task.
Index
Terms
Available to subscribers and IEEE members.
References
Available to subscribers and IEEE members.
Citing Documents
Available to subscribers and IEEE members.