By Topic

Audio-visual automatic speech recognition and related bimodal speech technologies: A review of the state-of-the-art and open problems

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Gerasimos Potamianos ; Institute of Informatics and Telecommunications, National Centre for Scientific Research ¿Demokritos¿, GR-15310 Athens, Greece

Summary form only given. The presentation will provide an overview of the main research achievements and the state-of-the-art in the area of audiovisual speech processing, mainly focusing in the area of audio-visual automatic speech recognition. The topic has been of interest in the speech research community due to the potential of increased robustness to acoustic noise that the visual modality holds. Nevertheless, significant challenges remain that have hindered practical applications of the technology most notably difficulties with visual speech information extraction and audio-visual fusion algorithms that remain robust to the audio-visual environment variability inherent in practical, unconstrained interaction scenarios and audio-visual data sources, for example multiparty interaction in smart spaces, broadcast news, etc. These challenges are also shared across a number of interesting audio-visual speech technologies beyond the core speech recognition problem, where the visual modality has the potential to resolve ambiguity inherent in the audio signal alone; for example, speech activity detection, speaker diarization, and source separation.

Published in:

Automatic Speech Recognition & Understanding, 2009. ASRU 2009. IEEE Workshop on

Date of Conference:

Nov. 13 2009-Dec. 17 2009