Skip to Main Content
Text Classification is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural-language processing techniques can be effectively applied to a large collection of texts. A significant one is to extract semantic information from corpus in plan text. In Vector Space Model, a document is conceptually represented by a vector of terms extracted from each document, with associated weights representing the importance of each term in the document and within the whole document collection. Likewise, an unclassified document is also modeled as a list of terms with associated weights representing the importance of the terms in it. Many techniques introduces much statistical information of terms to represent their semantic information. However, as always, document title is not taken into special consideration, while it obviously contains much semantic information. This paper proposes Title Vector to address this issue.