By Topic

Supervised term weighting for sentiment analysis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Tam T. Nguyen ; School of Computer Engineering, Nanyang Technological University, China ; Kuiyu Chang ; Siu Cheung Hui

Vector space text classification is commonly used in intelligence applications such as email and conversation analysis. In this paper we propose a supervised term weighting scheme called tf × KL (term frequency Kullback-Leibler), which weights each word proportionally to the ratio of its document frequency across the positive and negative class. We then generalize tf × KL to effectively deal with class imbalance, which is very common in real world intelligence analysis. The generalized tf × KL weights each word according to the ratio of the positive and negative class conditioned word probabilities instead of the raw document frequencies. Results on four classification datasets show tf × KL to perform consistently better than the baseline tf ×idf and 4 other supervised term weighting schemes, including the recently proposed tf × rf (term frequency relevance frequency). The generalized tf × KL was found to be extremely robust in dealing with highly skewed class distributions, beating the second runner-up by more than 20% on a dataset that has only 10% positive training examples. The generalized tf × KL is thus an effective and robust term weighting scheme that can significantly improve binary classification performance in sentiment analysis and intelligence applications.

Published in:

Intelligence and Security Informatics (ISI), 2011 IEEE International Conference on

Date of Conference:

10-12 July 2011