By Topic

Noisy Constrained Maximum-Likelihood Linear Regression for Noise-Robust Speech Recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
D. K. Kim ; Department of Electronics and Computer Engineering, Chonnam National University, Gwangju, Korea ; M. J. F. Gales

Adaptive training is a widely used technique for building speech recognition systems on nonhomogeneous training data. Recently, there has been interest in applying these approaches for situations where there is significant levels of background noise in the training data. Various schemes for adaptive training are based on noise-, or speaker-, specific transforms of features to yield estimates of the clean speech. However, when there are high levels of background noise, these clean speech estimates may be poor resulting in degradations in performance. In this paper, a new approach for adaptive training on noise-corrupted training data is presented. It extends a popular form of linear transform for model-based adaptation and adaptive training, constrained MLLR (CMLLR), to reflect additional uncertainty from noise-corrupted observations. This new form of adaptation transform is called noisy CMLLR (NCMLLR). NCMLLR uses a modified version of generative model between clean speech and noisy observation, similar to factor analysis (FA). However, in contrast to FA here the generative model describes an adaptation transform, rather than a covariance matrix structure. The use of NCMLLR for adaptive training using an expectation-maximization approach is described. Discriminative adaptive training with NCMLLR is also described based on the minimum phone error criterion. Experimental results comparing NCMLLR with standard adaptive training schemes are given on a noise-corrupted version of Resource Management, the ARPA 1994 CSRNAB Spoke 10 task, and in-car recorded data.

Published in:

IEEE Transactions on Audio, Speech, and Language Processing  (Volume:19 ,  Issue: 2 )
IEEE Biometrics Compendium
IEEE RFIC Virtual Journal
IEEE RFID Virtual Journal