Skip to Main Content
Content-based retrieval becomes a proper solution to handling the video database. In fact, the word video refers to both the image frames and the audio waveform contained in a video. However, there are only few approaches use the audio information. When either audio or visual information alone is not sufficient, combining audio and visual clues may resolve the ambiguities in individual modalities and thereby help to obtain more accurate answers. In this paper, we present a novel idea of integrating the audio features with visual features for video segmentation and retrieval.