Skip to Main Content
A feature compensation (FC) algorithm based on polynomial regression of utterance signal-to-noise ratio (SNR) for noise robust automatic speech recognition (ASR) is proposed. In this algorithm, the bias between clean and noisy speech features is approximated by a set of polynomials which are estimated from adaptation data from the new environment by the expectation-maximization (EM) algorithm under the maximum likelihood (ML) criterion. In ASR, the utterance SNR for the speech signal is first estimated and noisy speech features are then compensated for by regression polynomials. The compensated speech features are decoded via acoustic HMMs trained with clean data. Comparative experiments on the Aurora 2 (English) and the German part of the Aurora 3 databases are performed between FC and maximum likelihood linear regression (MLLR). With the Aurora2 experiments, there are two MLLR implementations: pooling adaptation data across all SNRs, and using three distinct SNR clusters. For each type of noise, FC achieves, on average, a word error rate reduction of 16.7% and 16.5% for Set A, and 20.5% and 14.6% for Set B compared to the first and second MLLR implementations, respectively. For each SNR condition, FC achieves, on average, a word error rate reduction of 33.1% and 34.5% for Set A, and 23.6% and 21.4% for Set B. Results using the Aurora3 database show that, the best FC performance outperforms MLLR by 15.9%, 3.0% and 14.6% for well-matched, medium-mismatched and high-mismatched conditions, respectively.