By Topic

A Low-Complexity Dynamic Face-Voice Feature Fusion Approach to Multimodal Person Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Dhaval Shah ; Ming Hsieh Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA, USA ; Kyu J. Han ; Shrikanth S. Narayanan

In this paper, we show the importance of face-voice correlation for audio-visual person recognition. We evaluate the performance of a system which uses the correlation between audio-visual features during speech against audio-only, video-only and audio-visual systems which use audio and visual features independently neglecting the interdependency of a person's spoken utterance and the associated facial movements. Experiments performed on the Vid-TIMIT dataset show that the proposed multimodal scheme has lower error rate than all other comparison conditions and is more robust against replay attacks. The simplicity of the fusion technique also allows the use of only one classifier which greatly simplifies system design and allows for a simple real-time DSP implementation.

Published in:

Multimedia, 2009. ISM '09. 11th IEEE International Symposium on

Date of Conference:

14-16 Dec. 2009