Skip to Main Content
In this paper, we introduce Bayesian sensing hidden Markov models (BS-HMMs) to represent sequential data based on a set of state-dependent basis vectors. The goal of this work is to perform Bayesian sensing and model regularization for heterogeneous training data. By incorporating a prior density on sensing weights, the relevance of different bases to a feature vector is determined by the corresponding precision parameters. The BS-HMM parameters, consisting of the basis vectors, the precision matrices of sensing weights and the precision matrices of reconstruction errors, are jointly estimated by maximizing the likelihood function, which is marginalized over the weight priors. We derive recursive solutions for the three parameters, which are expressed via maximum a posteriori estimates of the sensing weights. We specifically optimize BS-HMMs for large-vocabulary continuous speech recognition (LVCSR) by introducing a mixture model of BS-HMMs and by adapting the basis vectors to different speakers. Discriminative training of BS-HMMs in the model domain and the feature domain is also proposed. Experimental results on an LVCSR task show consistent improvements due to the three sets of BS-HMM parameters and demonstrate how the extensions of mixture models, speaker adaptation, and discriminative training achieve better recognition results compared to those of conventional HMMs based on Gaussian mixture models.
Audio, Speech, and Language Processing, IEEE Transactions on (Volume:20 , Issue: 1 )
Date of Publication: Jan. 2012