By Topic

Discovering Video Shot Categories by Unsupervised Stochastic Graph Partition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Xiaohua Duan ; Sun Yat-Sen University, Guangzhou, P.R. China ; Liang Lin ; Hongyang Chao

Video shots are often treated as the basic elements for retrieving information from videos. In recent years, video shot categorization has received increasing attention, but most of the methods involve a procedure of supervised learning, i.e., training a multi-class predictor (classifier) on the labeled data. In this paper, we study a general framework to unsupervisedly discover video shot categories. The contributions are three-fold in feature, representation, and inference: (1) A new feature is proposed to capture local information in videos, defined with small video patches (e.g., 11 × 11 × 5 pixels). A dictionary of video words can be thus clustered off-line, characterizing both appearance and motion dynamics. (2) We pose the problem of categorization as an automated graph partition task, in that each graph vertex represents a video shot, and a partitioned sub-graph consisting of connected graph vertices represents a clustered category. The model of each video shot category can be analytically calculated by a projection pursuit type of learning process. (3) An MCMC-based cluster sampling algorithm, namely Swendsen-Wang cuts, is adopted to efficiently solve the graph partition. Unlike traditional graph partition techniques, this algorithm is able to explore the nearly global optimal solution and eliminate the need for good initialization. We apply our method on a wide variety of 1600 video shots collected from Internet as well as a subset of TRECVID 2010 data, and two benchmark metrics, i.e., Purity and Conditional Entropy, are adopted for evaluating performance. The experimental results demonstrate superior performance of our method over other popular state-of-the-art methods.

Published in:

IEEE Transactions on Multimedia  (Volume:15 ,  Issue: 1 )