This paper proposes a new document representation method to text categorization. It applies category-based semantic field (CBSF) theory for text categorization to gain a more efficient representation of documents. The lexical chain is introduced to compute CBSF and Hownet* used as a lexical database. In particular, the title of each document functions as a clue to forecast the potential CBSF of the test document. Combined with classifier, this approach is examined in text categorization and the result indicates that it performs better than conventional methods with features chosen on the basis of bag-of-words (BOW) system, on the same task.
Published in:
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
(Volume:6
)
Date of Conference: 18-21 Aug. 2005