Scheduled System Maintenance:
On Monday, April 27th, IEEE Xplore will undergo scheduled maintenance from 1:00 PM - 3:00 PM ET (17:00 - 19:00 UTC). No interruption in service is anticipated.
By Topic

Multimodal Video Indexing and Retrieval Using Directed Information

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Xu Chen ; Dept. of Electr. Eng. & Comput. Sci., Univ. of Michigan at Ann Arbor, Ann Arbor, MI, USA ; Hero, A.O. ; Savarese, S.

We propose a novel framework for multimodal video indexing and retrieval using shrinkage optimized directed information assessment (SODA) as similarity measure. The directed information (DI) is a variant of the classical mutual information which attempts to capture the direction of information flow that videos naturally possess. It is applied directly to the empirical probability distributions of both audio-visual features over successive frames. We utilize RASTA-PLP features for audio feature representation and SIFT features for visual feature representation. We compute the joint probability density functions of audio and visual features in order to fuse features from different modalities. With SODA, we further estimate the DI in a manner that is suitable for high dimensional features p and small sample size n (large p small n ) between pairs of video-audio modalities. We demonstrate the superiority of the SODA approach in video indexing, retrieval, and activity recognition as compared to the state-of-the-art methods such as hidden Markov models (HMM), support vector machine (SVM), cross-media indexing space (CMIS), and other noncausal divergence measures such as mutual information (MI). We also demonstrate the success of SODA in audio and video localization and indexing/retrieval of data with missaligned modalities.

Published in:

Multimedia, IEEE Transactions on  (Volume:14 ,  Issue: 1 )