By Topic

Research on the categorization accuracy of different similarity measures on Chinese texts

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Xiangdong Li ; Sch. of Inf. Manage., Wuhan Univ., Wuhan, China ; Hangyu Liu ; Han Jia ; Li Huang

This paper works on the most intensively studied algorithm- k Nearest Neighbor algorithm. The purpose is to investigate the performance of different similarity measures in the kNN on Chinese texts. The two measures that we focus on are cosine value and Jensen-Shannon Divergence. We use both the corpus collected from the Sogou, whose data extracts from the website of Sohu.com, and datasets that we have processed from real word. The results of our experiment indicate that difference of similarity metrics significantly affects the categorization accuracy.

Published in:

Business Management and Electronic Information (BMEI), 2011 International Conference on  (Volume:4 )

Date of Conference:

13-15 May 2011