By Topic

Short Text Feature Extraction and Clustering for Web Topic Mining

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Hui He ; Beijing Univ. of Posts & Telecommun., Beijing ; Bo Chen ; Weiran Xu ; Jun Guo

This paper is to introduce an algorithm to cluster Chinese short texts for mining web topics based on Chinese chunks. Aiming at the characteristics of Chinese short texts, the algorithm employs N-gram feature extraction to capture Chinese chunks from texts, which reflect the text semantic structure and character dependency. Then RPCL algorithm is applied to realizing text clustering with high precision, which doesn't need know the exact number of clusters. Finally, the experiment results show that this approach can remarkably reduce the dimensionality and effectively improve the performance of Chinese short texts clustering than traditional methods.

Published in:

Semantics, Knowledge and Grid, Third International Conference on

Date of Conference:

29-31 Oct. 2007