Skip to Main Content
Document Topic Extraction aims at using several key phrases to describe the topics of documents. It can be applied in web document categorization and tagging, document clusters topic description and information retrieval tasks. In this paper, we propose a Wikipedia category-based document topic extraction method. Document is mapped to a set of Wikipedia categories and is represented as graph structure in order to conserve the relationship between Wikipedia categories. Then, document topic can be extracted by clustering the related Wikipedia categories in the document collection. Experiment in real data shows Wikipedia category-based document topic extraction method achieves the better result than latent topic modeling method, such as LDA.
Date of Conference: 15-19 April 2011