By Topic

Adaptation of Hidden Markov Models Using Model-as-Matrix Representation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

1 Author(s)
Yongwon Jeong ; School of Electrical Engineering, Pusan National University, Busan, Korea

In this paper, we describe basis-based speaker adaptation techniques using the matrix representation of training models. Bases are obtained from training models by decomposition techniques for matrix-variate objects: two-dimensional principal component analysis (2DPCA) and generalized low rank approximations of matrices (GLRAM). The motivation for using matrix representation is that the sample covariance matrix of training models can be more accurately computed and the speaker weight becomes a matrix. Speaker adaptation equations are derived in the maximum-likelihood (ML) framework, and the adaptation equations can be solved using the maximum-likelihood linear regression technique. Additionally, novel applications of probabilistic 2DPCA and GLRAM to speaker adaptation are presented. From the probabilistic 2DPCA/GLRAM of training models, speaker adaptation equations are formulated in the maximum a posteriori (MAP) framework. The adaptation equations can be solved using the MAP linear regression technique. In the isolated-word experiments, the matrix representation-based methods in the ML and MAP frameworks outperformed maximum-likelihood linear regression adaptation, MAP adaptation, eigenvoice, and probabilistic PCA-based model for adaptation data longer than 20 s. Furthermore, the adaptation methods using probabilistic 2DPCA/GLRAM showed additional performance improvement over the adaptation methods using 2DPCA/GLRAM for small amounts of adaptation data.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 8 )