By Topic

Using Linguistic Information to Classify Portuguese Text Documents

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Goncalves, T. ; Dept. de Inf., Univ. de Evora, Evora ; Quaresma, P.

This paper examines the role of various linguistic structures on text classification applying the study to the Portuguese language. Besides using a bag-of-words representation where we evaluate different measures and use linguistic knowledge for term selection, we do several experiments using syntactic information representing documents as strings of words and strings of syntactic parse trees. To build the classifier we use the support vector machine (SVM) algorithm which is known to produce good results on text classification tasks and apply the study to a dataset of articles from the Publico newspaper. The results show that sentences' syntactic structure is not useful for text classification (as initially expected), but part-of-speech information can be used as a term selection technique to construct the bag-of-words representation of documents.

Published in:

Artificial Intelligence, 2008. MICAI '08. Seventh Mexican International Conference on

Date of Conference:

27-31 Oct. 2008