This paper presents a new approach for speech feature enhancement in the log-spectral domain for noisy speech recognition. A switching linear dynamic model (SLDM) is explored as a parametric model for the clean speech distribution. Each multivariate linear dynamic model (LDM) is associated with the hidden state of a hidden Markov model (HMM) as an attempt to describe the temporal correlations among adjacent frames of speech features. The state transition on the Markov chain is the process of activating a different LDM or activating some of them simultaneously by different probabilities generated by the HMM. Rather than holding a transition probability for the whole process, a connectionist model is employed to learn the time variant transition probabilities. With the resulting SLDM as the speech model and with a model for the noise, speech and noise are jointly tracked by means of switching Kalman filtering. Comprehensive experiments are carried out using the Aurora2 database to evaluate the new algorithm. The results show that the new SLDM approach can further improve the speech feature enhancement performance in terms of noise-robust recognition accuracy, since the transition probabilities among the LDMs can be described more precisely at each time point.
Published in:
Audio, Speech, and Language Processing, IEEE Transactions on
(Volume:16
,
Issue:
5
)
Date of Publication: July 2008