By Topic

A Method of Joint Compensation of Additive and Convolutive Distortions for Speaker-Independent Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Gong, Yifan ; DSP Solutions R&D Center, Speech Technol. Lab., Dallas, TX, USA

A speech recognizer operating in a mobile environment has to be robust to two distortion sources: ambient noise (additive distortion) and microphone changes (convolutive distortion). Explicitly and simultaneously modeling the two distortion sources has been a great challenge for speech recognition in adverse environments. In this paper, two log-spectral domain components are introduced in speech acoustic models to represent additive and convolutive distortions. A method, called JAC, jointly compensates both additive and convolutive distortions. For each utterance to be recognized, it adapts HMM mean vectors with a noise estimate and a channel estimate. The noise estimate is calculated from the pre-utterance pause and the channel estimate is calculated using an EM algorithm from speech utterances produced in the distortion environment. The algorithm is evaluated on a noisy speech database recorded in-vehicle with a hands-free distant microphone in several sessions, including parked, stop-and-go, and highway driving conditions. Experiments show that the method typically reduces recognition word error rate by an order of magnitude. The method makes it possible to obtain high performance for speaker-independent recognition in changing noisy environments without collecting any noisy speech for training.

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:13 ,  Issue: 5 )