By Topic

A maximum a posteriori approach to speaker adaptation using the trended hidden Markov model

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
R. Chengalvarayan ; Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont., Canada ; Li Deng

A formulation of the maximum a posteriori (MAP) approach to speaker adaptation is presented with use of the trended or nonstationary-state hidden Markov model (HMM), where the Gaussian means in each HMM state are characterized by time-varying polynomial trend functions of the state sojourn time. Assuming uncorrelatedness among the polynomial coefficients in the trend functions, we have obtained analytical results for the MAP estimates of the parameters including time-varying means and time-invariant precisions. We have implemented a speech recognizer based on these results in speaker adaptation experiments using the TI46 corpora. The experimental evaluation demonstrates that the trended HMM, with use of either the linear or the quadratic polynomial trend function, consistently outperforms the conventional, stationary-state HMM. The evaluation also shows that the unadapted, speaker-independent models are outperformed by the models adapted by the MAP procedure under supervision with as few as a single adaptation token. Further, adaptation of polynomial coefficients alone is shown to be better than adapting both polynomial coefficients and precision matrices when fewer than four adaptation tokens are used, while the reverse is found with a greater number of adaptation tokens

Published in:

IEEE Transactions on Speech and Audio Processing  (Volume:9 ,  Issue: 5 )