Skip to Main Content
Audio content analysis is helpful in many multimedia applications. We present a unified framework for content analysis of composite audio. The framework is designed to extract relevant information from different available audio modalities and to discover high-level semantics conveyed by the data. We also demonstrate an implementation of the proposed framework for the detection of scenes and events in various TV shows and movies, in which key audio effects are first extracted as a midlevel representation, and then a Bayesian network is used for high-level semantics inference. Experiments on 12-hour audio data indicate that the proposed framework has a satisfying performance.