Skip to Main Content
In this paper, a new method to identify the violent videos by the bag of audio words is introduced. The MPEG-7 audio descriptors are firstly extracted, including the low level features such as AudioSpectrumCentroid and AudioSpectrum-Spread, etc. The audio words are then built according to the MPEG-7 high level descriptor, the AudioSighnature, which is considered as the “fingerprint” of the audio stream. The support vector machine is used to classify the feature vectors into two classes, i.e. the violent and non-violent videos. The experiment results demonstrate that our method can achieve good recall accuracy.