Skip to Main Content
We propose a context-based model of video abstraction exploiting both audio and video features and applied to tennis TV programs. We can automatically produce different types of summary of a given video depending on the users' constraints or preferences. We have first designed an efficient and accurate temporal segmentation of the video into segments homogeneous w.r.t the camera motion. We introduce original visual descriptors related to the dominant and residual image motions. The different summary types are obtained by specifying adapted classification criteria which involve audio features to select the relevant segments to be included in the video abstract. The proposed scheme has been validated on 22 hours of tennis videos.