By Topic

Speaker detection using the timing structure of lip motion and sound

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Horii, Y. ; Grad. Sch. of Inf., Kyoto Univ., Kyoto ; Kawashima, H. ; Matsuyama, T.

In this paper, we propose a novel approach to speaker detection by an integration of audio-visual information using the cue of timing structure. We first extract feature sequences of lip motion and sound, and segment each of them into temporal intervals. Then, we construct a cross-media timing-structure model of human speech by learning the temporal relations of overlapping intervals. Based on the learned model, we realize speaker detection by evaluating the timing structure of the observed video and audio. Our experimental result shows the effectiveness of using temporal relations of intervals for speaker detection.

Published in:

Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on

Date of Conference:

23-28 June 2008