By Topic

Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Cui, Xiaodong ; Dept. of Electr. Eng., Univ. of California, Los Angeles, CA, USA ; Alwan, Abeer

A feature compensation (FC) algorithm based on polynomial regression of utterance signal-to-noise ratio (SNR) for noise robust automatic speech recognition (ASR) is proposed. In this algorithm, the bias between clean and noisy speech features is approximated by a set of polynomials which are estimated from adaptation data from the new environment by the expectation-maximization (EM) algorithm under the maximum likelihood (ML) criterion. In ASR, the utterance SNR for the speech signal is first estimated and noisy speech features are then compensated for by regression polynomials. The compensated speech features are decoded via acoustic HMMs trained with clean data. Comparative experiments on the Aurora 2 (English) and the German part of the Aurora 3 databases are performed between FC and maximum likelihood linear regression (MLLR). With the Aurora2 experiments, there are two MLLR implementations: pooling adaptation data across all SNRs, and using three distinct SNR clusters. For each type of noise, FC achieves, on average, a word error rate reduction of 16.7% and 16.5% for Set A, and 20.5% and 14.6% for Set B compared to the first and second MLLR implementations, respectively. For each SNR condition, FC achieves, on average, a word error rate reduction of 33.1% and 34.5% for Set A, and 23.6% and 21.4% for Set B. Results using the Aurora3 database show that, the best FC performance outperforms MLLR by 15.9%, 3.0% and 14.6% for well-matched, medium-mismatched and high-mismatched conditions, respectively.

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:13 ,  Issue: 6 )