Abstract:
We present a keyword extraction system for Mongolian documents using word co-occurrence statistical information which used in for English, Chinese and other languages. Th...Show MoreMetadata
Abstract:
We present a keyword extraction system for Mongolian documents using word co-occurrence statistical information which used in for English, Chinese and other languages. This method based on extracting top frequent words and building the co-occurrence matrix showing the occurrence of each frequent word. The biasness degree of the words and the set of frequent words are measured using CHI-Square Method (χ2). Also, the weight of the words and the set of frequent words are measured using word frequency - inverted word frequency (WF-IWF). Therefore words with high χ2 values and high WF-IWF values are likely to be keywords. The adopted χ2 method in this study is compared with another one method based on WF-IWF which tested for Mongolian. Two different documents were used to evaluate the system performance. We evaluate the effectiveness of χ2 method and WF-IWF method. Results show that the χ2 method is better than WF-IWF.
Published in: 2017 29th Chinese Control And Decision Conference (CCDC)
Date of Conference: 28-30 May 2017
Date Added to IEEE Xplore: 17 July 2017
ISBN Information:
Electronic ISSN: 1948-9447