This paper treats gunshot detection in audio streams from movies as a maximization task, where the solution is obtained by means of dynamic programming. The proposed method seeks the sequence of segments and respective class labels, i.e., gunshots vs. all other audio types, that maximize the product of posterior class label probabilities, given the segments' data. The required posterior probabilities are estimated by combining soft classification decisions from a set of Bayesian Network combiners. Tests that have been performed on a large set of audio streams indicate that the proposed method yields high performance in terms of both precision and recall of detected gunshot events.
Published in:
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Date of Conference: March 31 2008-April 4 2008