To improve recognition, speech signals corrupted by a variety of noises can be used in speech model training. Published hidden Markov modeling of speech uses multiple Gaussian distributions to cover the spread of the speech distribution caused by the noises, which distracts the modeling of speech event itself and and possibly sacrifices the performance on clean speech. We extend GMHMM by allowing state emission parameters to change as function of an environment-dependent continuous variable. At the recognition time, a set of HMMs specific to the given the environment is instantiated and used for recognition. Variable parameter (VP) HMM with parameters modeled as a polynomial function of the environment variable is developed. Parameter estimation based on EM-algorithm is given. With the same number of mixtures, VPHMM reduces WER by 40% compared to conventional multi-condition training.
Published in:
Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
(Volume:1
)
Date of Conference: 6-10 April 2003