Skip to Main Content
[This paper has been withdrawn by the publisher]. A novel method to identify the violent videos only with audio features is introduced. Most previous content-based image or video classification schemes apply the bag of words (BOW) or bag of visual words (BOVW), which employ multiple visual features to characterize image or video content. In our method, the bag of audio words (BOAW) is suggested to be built by effective audio features. Two reasons are considered here. First, audio features should have very special significance for violent videos. Second, the computational complexity of dealing with audio features is much lower than that of visual features. The MPEG-7 low level features such as Audio Spectrum-Centroid and Audio Spectrum-Spread, and the high level feature such as Audio Signature, are combined into one 44-dimensions vector in the BOAW model. The audio words are built from the vector by the clustering strategy, and support vector machine (SVM) with revised soft-weighting scheme is used to group the audio words features into two classes, i.e. the violent and non-violent. Experiments demonstrate that the proposed method can achieve good recall accuracy and precision accuracy on detecting violent videos. The method also can be applied to classify other types of videos.
Note: Document Suppressed in IEEE Xplore. The document that should appear here has been removed because it was submitted for publication without proper authorization. The article was not written by the author of record noted in the bibliographic data. Mr. Li was neither aware of this having been published in his name, nor is he responsible for any content as written. We regret any inconvenience.