By Topic

Improving Thai educational Web page classification using inverse class frequency

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Lertnattee, V. ; Fac. of Pharmacy, Silpakorn Univ., Nakorn Pathom, Thailand ; Theeramunkong, T.

Automatic text classification for a Web collection is a challenge task, especially in the case that the language is not English, such as Thai. However, most of Thai educational Web pages usually include English terms due to their technical aspect. Lots of technical terms and typing errors both in Thai and in English are found in Web sites of universities. Most previous works on text categorization applied term frequency and inverse document frequency for representing importance of terms. In this paper, we use inverse class frequency instead of inverse document frequency in centroid-based text categorization because it works well on a collection with a large number of unique terms. The experimental results show that inverse class frequency is useful, especially when it is applied on both prototype and query vectors.

Published in:

Communications and Information Technology, 2005. ISCIT 2005. IEEE International Symposium on  (Volume:2 )

Date of Conference:

12-14 Oct. 2005