By Topic

Enhancing text classification using synopses extraction

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Liping Ma ; Sch. of Comput. Sci. & Eng., New South Wales Univ., Sydney, NSW, Australia ; J. Shepherd ; Yanchun Zhang

This paper describes a novel approach to document classification that uses decision-tree machine learning based on a succinct vector of important terms in each document. The succinct vector itself is generated by a machine-learning approach which builds parsers that can identify significant features in a document by partitioning it into regions based on low-level document characteristics. The fact that the feature vector is succinct overcomes the problem of very large term vectors, which have hindered the application of conventional machine learning to document classification. The fact that the parser can be trained to extract only important terms from documents means that small training sets can be used to achieve the same classification accuracy as with conventional approaches.

Published in:

Web Information Systems Engineering, 2003. WISE 2003. Proceedings of the Fourth International Conference on

Date of Conference:

10-12 Dec. 2003