By Topic

Joint Uncertainty Decoding With Predictive Methods for Noise Robust Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Haitian Xu ; Toshiba Res. Eur., Ltd., Cambridge, UK ; Gales, M.J.F. ; Chin, K.K.

Model-based noise compensation techniques are a powerful approach to improve speech recognition performance in noisy environments. However, one of the major issues with these schemes is that they are computationally expensive. Though techniques have been proposed to address this problem, they often result in degradations in performance. This paper proposes a new, highly flexible, approach which allows the computational load required for noise compensation to be controlled while maintaining good performance. The scheme applies the improved joint uncertainty decoding with the predictive linear transform framework. The final compensation is implemented as a set of linear transforms of the features, decoupling the computational cost of compensation from the complexity of the recognition system acoustic models. Furthermore, by using linear transforms, changes in the correlations in the feature vector can also be efficiently modeled. The proposed methods can be easily applied in an adaptive training scheme, including discriminative adaptive training. The performance of the approach is compared to a number of standard schemes on Aurora 2 as well as in-car speech recognition tasks. Results indicate that the proposed scheme is an attractive alternative to existing approaches.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:19 ,  Issue: 6 )