By Topic

Developing an effective Thai Document Categorization Framework base on term relevance frequency weighting

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Nivet Chirawichitchai ; Department of Information Technology, Faculty of Information Technology, King Mongkut's University of Technology, North Bangkok, Thailand ; Parinya Sa-nguansat ; Phayung Meesad

Text Categorization is the process of automatically assigning predefined categories to free text documents. Feature weighting, which calculates feature (term) values in documents, is an important preprocessing technique in text categorization. In this paper, we purpose Thai Document Categorization Framework focusing on the comparison of various term weighting schemes, including Boolean, tf, tf-idf, tfc, ltc entropy and tf-rf weighting. We have evaluated these methods on Thai news article corpus with three supervised learning classifiers. We found tf-rf weighting most effective in our experiments with SVM NB and DT algorithms. Based on our experiments, using tf-rf weighting with SVM algorithm yielded the best performance with the F-measure equaling 95.9%.

Published in:

2010 Eighth International Conference on ICT and Knowledge Engineering

Date of Conference:

24-25 Nov. 2010