This paper presents a method for detecting violent content in video sharing sites. The proposed approach operates on a fusion of three modalities: audio, moving image and text data, the latter being collected from the accompanying user comments. The problem is treated as a binary classification task (violent vs non-violent content) on a 9-dimensional feature space, where 7 out of 9 features are extracted from the audio stream. The proposed method has been evaluated on 210 YouTube videos and the overall accuracy has reached 82%.
Published in:
Pattern Recognition (ICPR), 2010 20th International Conference on
Date of Conference: 23-26 Aug. 2010