By Topic

A maximum-likelihood approach to stochastic matching for robust speech recognition

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Sankar, Ananth ; Speech Res. Dept., AT&T Bell Labs., Murray Hill, NJ, USA ; Chin-Hui Lee

Presents a maximum-likelihood (ML) stochastic matching approach to decrease the acoustic mismatch between a test utterance and a given set of speech models so as to reduce the recognition performance degradation caused by distortions in the test utterance and/or the model set. We assume that the speech signal is modeled by a set of subword hidden Markov models (HMM) Λx. The mismatch between the observed test utterance Y and the models Λx can be reduced in two ways: 1) by an inverse distortion function Fν (.) that maps Y into an utterance X that matches better with the models Λx and 2) by a model transformation function G η(.) that maps Λx to the transformed model Λx that matches better with the utterance Y. We assume the functional form of the transformations Fν(.) or Gη(.) and estimate the parameters ν or η in a ML manner using the expectation-maximization (EM) algorithm. The choice of the form of Fν(.) or Gη(.) is based on prior knowledge of the nature of the acoustic mismatch. The stochastic matching algorithm operates only on the given test utterance and the given set of speech models, and no additional training data is required for the estimation of the mismatch prior to actual testing. Experimental results are presented to study the properties of the proposed algorithm and to verify the efficacy of the approach in improving the performance of a HMM-based continuous speech recognition system in the presence of mismatch due to different transducers and transmission channels

Published in:

Speech and Audio Processing, IEEE Transactions on  (Volume:4 ,  Issue: 3 )