By Topic

Improve VSM text classification by title vector based document representation method

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Tian Xia ; Dept. of Comput. & Inf., Shanghai Second Polytech. Univ., Shanghai, China ; Yi Du

Text Classification is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural-language processing techniques can be effectively applied to a large collection of texts. A significant one is to extract semantic information from corpus in plan text. In Vector Space Model, a document is conceptually represented by a vector of terms extracted from each document, with associated weights representing the importance of each term in the document and within the whole document collection. Likewise, an unclassified document is also modeled as a list of terms with associated weights representing the importance of the terms in it. Many techniques introduces much statistical information of terms to represent their semantic information. However, as always, document title is not taken into special consideration, while it obviously contains much semantic information. This paper proposes Title Vector to address this issue.

Published in:

Computer Science & Education (ICCSE), 2011 6th International Conference on

Date of Conference:

3-5 Aug. 2011