Skip to Main Content
In this paper, we develop the concept of descriptors based on perceptual-level motion features such as time-to-collision, shot transition and temporal motion and it is shown that by including them the representational level of the video classes is significantly enhanced, e.g. violence could be detected. The temporal context cues, which had been largely neglected by present content-based retrieval (CBR) systems, are integrated into the framework. A dynamic Bayesian framework for the CBR systems which can learn the temporal structure through the fusion of all the features is designed The experimental results for more than 4 hours of videos are presented for a number of key applications like sequence identifier, highlight extraction for sports, and detecting climax or violence.