Abstract:
Sentiment classification of tweets is used for a variety of social sensing tasks and provides a means of discerning public opinion on a wide range of topics. A potential ...Show MoreMetadata
Abstract:
Sentiment classification of tweets is used for a variety of social sensing tasks and provides a means of discerning public opinion on a wide range of topics. A potential concern when performing sentiment classification is that the training data may contain class imbalance, which can negatively affect classification performance. A classifier trained on imbalanced data may be biased in favor of the majority class. One possibile method of addressing this is to use data sampling to achieve a more balanced class distribution. In this work, we seek to observe how data sampling (using random undersampling with either a 50:50 or 35:65 positive:negative post-sampling class distribution ratio) affects the classification performance on tweet sentiment data. Our experimental results show that Random Undersampling significantly improves classification performance in comparison to not using any data sampling. Furthermore, there is no significant difference between selecting a 50:50 or 35:65 post-sampling class distribution ratio.
Date of Conference: 13-15 August 2015
Date Added to IEEE Xplore: 26 October 2015
Electronic ISBN:978-1-4673-6656-4