We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Cross-Domain Multicue Fusion for Concept-Based Video Indexing

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Weng, Ming-Fang ; National Taiwan University, Taipei ; Yung-Yu Chuang

The success of query-by-concept, proposed recently to cater to video retrieval needs, depends greatly on the accuracy of concept-based video indexing. Unfortunately, it remains a challenge to recognize the presence of concepts in a video segment or to extract an objective linguistic description from it because of the semantic gap, that is, the lack of correspondence between machine-extracted low-level features and human high-level conceptual interpretation. This paper studies three issues with the aim to reduce such a gap: 1) how to explore cues beyond low-level features, 2) how to combine diverse cues to improve performance, and 3) how to utilize the learned knowledge when applying it to a new domain. To solve these problems, we propose a framework that jointly exploits multiple cues across multiple video domains. First, recursive algorithms are proposed to learn both interconcept and intershot relationships from annotations. Second, all concept labels for all shots are simultaneously refined in a single fusion model. Additionally, unseen shots are assigned pseudolabels according to their initial prediction scores so that contextual and temporal relationships can be learned, thus requiring no additional human effort. Integration of cues embedded within training and testing video sets accommodates domain change. Experiments on popular benchmarks show that our framework is effective, achieving significant improvements over popular baselines.

Published in:

Pattern Analysis and Machine Intelligence, IEEE Transactions on  (Volume:34 ,  Issue: 10 )