Skip to Main Content
Web forums are frequently used as platforms for the exchange of information and opinions, as well as propaganda dissemination. But online content can be misused when the information being distributed, such as radical opinions, is unsolicited or inappropriate. However, radical opinion is highly hidden and distributed in Web forums, while non-radical content is unspecific and topically more diverse. It is costly and time consuming to label a large amount of radical content (positive examples) and non-radical content (negative examples) for training classification systems. Nevertheless, it is easy to obtain large volumes of unlabeled content in Web forums. In this paper, we propose and develop a topic-sensitive partially supervised learning approach to address the difficulties in radical opinion identification in hate group Web forums. Specifically, we design a labeling heuristic to extract high quality positive examples and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group Web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance than its counterparts.
Date of Conference: 11-14 June 2012