By Topic

Accurate SVM Text Classification for Highly Skewed Data Using Threshold Tuning and Query-Expansion-Based Feature Selection

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
B. Goertzel ; Applied Research Laboratory for National and Homeland Security at Virginia Tech's National Capital Operation, 2000 N. 15th St., Suite 503, Arlington VA 22201. ; J. Venuto

A novel technique is described, wherein Support Vector Machines are used to perform relatively effective text categorization based on small numbers of positive examples (fewer than 10 in some cases). It is assumed that in addition to the positive examples a query describing the positive category is given (in the form of a set of key phrases or a sentence). The technique combines two innovations: a special way of altering the SVM score threshold based on looking at the distribution of scores across the training set; and, a method of feature selection that involves retaining only features that display semantic association to the content words in the query (according to a word-association database produced by statistical analysis of a parsed corpus). Examples are given on a number of test cases drawn from the Reuters and FBIS news archives.

Published in:

The 2006 IEEE International Joint Conference on Neural Network Proceedings

Date of Conference:

0-0 0