Skip to Main Content
Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf, tfc, ltc entropy and tf-rf weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers. We found tf-rf weighting most effective in our experiments with SVM NB and DT algorithms. Based on our experiments, using tf-rf weighting with SVM algorithm yielded the best performance with the F-measure equaling 95.9%.