By Topic

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Saheer, L. ; Idiap Res. Inst., Martigny, Switzerland ; Dines, J. ; Garner, P.N.

Vocal tract length normalization (VTLN) has been successfully used in automatic speech recognition for improved performance. The same technique can be implemented in statistical parametric speech synthesis for rapid speaker adaptation during synthesis. This paper presents an efficient implementation of VTLN using expectation maximization and addresses the key challenges faced in implementing VTLN for synthesis. Jacobian normalization, high-dimensionality features and truncation of the transformation matrix are a few challenges presented with the appropriate solutions. Detailed evaluations are performed to estimate the most suitable technique for using VTLN in speech synthesis. Evaluating VTLN in the framework of speech synthesis is also not an easy task since the technique does not work equally well for all speakers. Speakers have been selected based on different objective and subjective criteria to demonstrate the difference between systems. The best method for implementing VTLN is confirmed to be use of the lower order features for estimating warping factors.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 7 )