By Topic

Using multiple features and statistical model to calculate text units similarity

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Yong-Dong Xu ; Sch. of Comput. Sci. & Technol., Harbin Inst. of Technol., China ; Zhi-Ming Xu ; Xiao-Long Wang ; Yuan-Chao Liu
more authors

In many NLP applications, identifying similar information from a set of related documents is a common problem. In this paper, the similarity between two Chinese text units is determined by multiple features extracted from these units, including word statistical features, part of speech features, semantic features, word density feature and text discourse structure features. In addition, a statistical method based on logistic regression model is proposed to automatically fuse these features and calculate the similarity between text paragraphs. The experiment that compares this method with two popular used methods shows the effectiveness of this approach.

Published in:

Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on  (Volume:6 )

Date of Conference:

18-21 Aug. 2005