Skip to Main Content
In this paper, we propose a novel local feature based approach, namely Bag-of-Importance (BoI) model, for static video summarization, while most of the existing approaches characterize each video frame with global features to derive the importance of each frame. Since local features such as interest points are more discriminative in characterizing visual content, we formulate static video summarization as a problem of identifying representative frames which contain more important local features, where the representativeness of each frame is the aggregation of the importance of the local features contained in the frame. In order to derive the importance of each local feature for a given video, we employ sparse coding to project each local feature into a sparse space, calculate the l2 norm of the sparse coefficients for each local feature, and generate the BoI representation with the distribution of the importance over all the local features in the video. We further take the perceptual difference among spatial regions of a frame into account, a spatial weighting template is utilized to differentiate the importance of local features for the individual frames. With the proposed video summarization scheme, both the inter-frame and intra-frame properties of local features are exploited, which allows the selected frames capture both the dominant content and discriminative details within a video. Experimental results on a dataset across several genres demonstrate that the proposed approach clearly outperforms the state-of-the-art method.