Skip to Main Content
Text categorization is an important research field within text mining. A document, actually, is often full of class-independent Â¿generalÂ¿ words which many documents and classes share. These Â¿generalÂ¿ words do harm to text categorization rather than contribute to the task. Inspired by human cognitive procedure in text classification task, we propose a novel approach called Class Core Extraction (CCE) method to extractÂ¿coreÂ¿ terms from each class. The Â¿coreÂ¿ terms, which include not only the single-words but also the combinations of words just like a simple description of context, must be those terms with strong distinguishing power. In testing phase, a suitable algorithm what we called Â¿lotteryÂ¿ algorithm is also proposed, which use weighted matching strategy to make final categorization decision. The comparative experimentation two datasets shows that the accuracy of our approach outperforms the k-nearest-neighbor (kNN) based classifier, as well as outstanding efficiency compare with the Support Vector Machine (SVM) based classifier.