Automatic terminology extraction requires termhood verification for extracted terms in a specific domain. Chinese terminology extraction suffers from insufficient domain corpora for verification even though there is abundance of information in other languages. This paper presents a novel approach to overcome this problem by using word translations and bilingual web resources to improve both coverage and precision. The proposed approach incorporates bilingual information from within candidate terms themselves and from existing domain knowledge to conduct termhood calculation. In contrast to previous researches, this method is not confined to only pre-determined corpora. Preliminary experiments show a 14.8% improvement in coverage and 26.3% improvement in precision, respectively.
Published in:
Natural Language Processing and Knowledge Engineering, 2007. NLP-KE 2007. International Conference on
Date of Conference: Aug. 30 2007-Sept. 1 2007