Skip to Main Content
Effective and efficient representation of the low-level features of groups of frames or shots is an important yet challenging task for video analysis and retrieval. Key frame-based representation is limited by the difficulties in shot boundary detection of gradual transitions, and the variety of methods of key frame extraction. In this paper, we employ the mean shift-based mode seeking function to develop a new approach for compact representation of the video segment. The proposed video representation is motivated by recognizing that, on the global level, humans perceive images only as a combination of the few most prominent colors. We exploit the spatiotemporal mode seeking in feature space to simulate the "subjectivity" of human decisions in video segment retrieval and identification. The effectiveness of the video representation and matching scheme is shown by initial experiments on replay detection in broadcast sports videos.