By Topic

Rapid discriminative acoustic model based on eigenspace mapping for fast speaker adaptation

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Bowen Zhou ; Robust Speech Process. Group, Univ. of Colorado, Boulder, CO, USA ; J. H. L. Hansen

It is widely believed that strong correlations exist across an utterance as a consequence of time-invariant characteristics of speaker and acoustic environments. It is verified in this paper that the first primary eigendirections of the utterance covariance matrix are speaker dependent. Based on this observation, a novel family of fast speaker adaptation algorithms entitled Eigenspace Mapping (EigMap) is proposed. The proposed algorithms are applied to continuous density Hidden Markov Model (HMM) based speech recognition. The EigMap algorithm rapidly constructs discriminative acoustic models in the test speaker's eigenspace by preserving discriminative information learned from baseline models in the directions of the test speaker's eigenspace. Moreover, the adapted models are compressed by discarding model parameters that are assumed to contain no discrimination information. The core idea of EigMap can be extended in many ways, and a family of algorithms based on EigMap is described in this paper. Unsupervised adaptation experiments show that EigMap is effective in improving baseline models using very limited amounts of adaptation data with superior performance to conventional adaptation techniques such as MLLR and block diagonal MLLR. A relative improvement of 18.4% over a baseline recognizer is achieved using EigMap with only about 4.5 s of adaptation data. Furthermore, it is also demonstrated that EigMap is additive to MLLR by encompassing important speaker dependent discriminative information. A significant relative improvement of 24.6% over baseline is observed using 4.5 s of adaptation data by combining MLLR and EigMap techniques.

Published in:

IEEE Transactions on Speech and Audio Processing  (Volume:13 ,  Issue: 4 )