By Topic

Chinese Web Text Outlier Mining Based on Domain Knowledge

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Xia Huosong ; Dept. of Inf. Manage. & Inf. Syst., Wuhan Textile Univ., Wuhan, China ; Fan Zhaoyan ; Peng Liuyan

Web text mining is a growing research area in data mining. Interestingly, the existing Web text mining algorithms have concentrated on finding frequent patterns while discarding the less frequent ones that may contain outliers. In addition, the domain knowledge in one industry is partly different from that in the others. Whatever they belong to, web texts are analyzed using the same dictionary. This paper proposes formal definitions of Web text outliers and Web text outlier mining, and presents a framework of Web text outlier mining based on domain knowledge. To verify the feasibility of the framework, an algorithm for mining Chinese Web text outliers is proposed based on improved VSM and n-grams. Experimental results with insurance topic show that the mining algorithm is effectively capable of finding Chinese Web text outliers from web text data, and has higher precision and recall and lower complexity.

Published in:

Intelligent Systems (GCIS), 2010 Second WRI Global Congress on  (Volume:2 )

Date of Conference:

16-17 Dec. 2010