Scheduled System Maintenance:
On May 6th, single article purchases and IEEE account management will be unavailable from 8:00 AM - 12:00 PM ET (12:00 - 16:00 UTC). We apologize for the inconvenience.
By Topic

A bag-of-importance model for video summarization

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Shiyang Lu ; Univ. of Sydney, Sydney, NSW, Australia ; Zhiyong Wang ; Yuan Song ; Tao Mei
more authors

In this paper, we propose a novel local feature based approach, namely Bag-of-Importance (BoI) model, for static video summarization, while most of the existing approaches characterize each video frame with global features to derive the importance of each frame. Since local features such as interest points are more discriminative in characterizing visual content, we formulate static video summarization as a problem of identifying representative frames which contain more important local features, where the representativeness of each frame is the aggregation of the importance of the local features contained in the frame. In order to derive the importance of each local feature for a given video, we employ sparse coding to project each local feature into a sparse space, calculate the l2 norm of the sparse coefficients for each local feature, and generate the BoI representation with the distribution of the importance over all the local features in the video. We further take the perceptual difference among spatial regions of a frame into account, a spatial weighting template is utilized to differentiate the importance of local features for the individual frames. With the proposed video summarization scheme, both the inter-frame and intra-frame properties of local features are exploited, which allows the selected frames capture both the dominant content and discriminative details within a video. Experimental results on a dataset across several genres demonstrate that the proposed approach clearly outperforms the state-of-the-art method.

Published in:

Multimedia and Expo Workshops (ICMEW), 2013 IEEE International Conference on

Date of Conference:

15-19 July 2013