Conferences >Speech Communication; 11. ITG...

Multimodal ASR by Turbo Decoding vs. Feature Concatenation: Where to Perform Information Integration?

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

To achieve robustness against environmental interferences, the incorporation of visual information has been shown as effective approach to robust automatic speech recogni...Show More

Metadata

Abstract:

To achieve robustness against environmental interferences, the incorporation of visual information has been shown as effective approach to robust automatic speech recognition (ASR). However, still questionable in multimodal speech processing is the optimal stage of information integration. Considering results from multiple-in-single-out (MISO) mobile communications suggesting early integration levels, multimodal ASR may suffer from early integration due to inherent asynchrony of audio and video features. In this paper we investigate whether early or middle integration strategies perform best in multimodal ASR by comparing feature concatenation and turbo decoding approaches. Applied to an audio-visual speech recognition task on a large database, we show the significant benefit of turbo ASR approaches (middle integration) over early integration feature vector concatenation outperforming these by about 13% absolute at a signal-to-noise ratio (SNR) of 0 dB.

Published in: Speech Communication; 11. ITG Symposium

Date of Conference: 24-26 September 2014

Date Added to IEEE Xplore: 17 October 2014

Print ISBN:978-3-8007-3640-9

Conference Location: Erlangen, Germany

Multimodal ASR by Turbo Decoding vs. Feature Concatenation: Where to Perform Information Integration?

Abstract:

Metadata

Abstract:

IEEE Account

Purchase Details

Profile Information

Need Help?

Multimodal ASR by Turbo Decoding vs. Feature Concatenation: Where to Perform Information Integration?

Alerts

Abstract:

Metadata

Abstract:

Authors

Keywords

Metrics

IEEE Account

Purchase Details

Profile Information

Need Help?