Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Product of Experts for Statistical Parametric Speech Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Zen, H. ; Nagoya Inst. of Technol., Nagoya, Japan ; Gales, M.J.F. ; Nankaku, Y. ; Tokuda, K.

Multiple acoustic models are often combined in statistical parametric speech synthesis. Both linear and non-linear functions of an observation sequence are used as features to be modeled. This paper shows that this combination of multiple acoustic models can be expressed as a product of experts (PoE); the likelihoods from the models are scaled, multiplied together, and then normalized. Normally these models are individually trained and only combined at the synthesis stage. This paper discusses a more consistent PoE framework where the models are jointly trained. A training algorithm for PoEs based on linear feature functions and Gaussian experts is derived by generalizing the training algorithm for trajectory HMMs. However for non-linear feature functions or non-Gaussian experts this is not possible, so a scheme based on contrastive divergence learning is described. Experimental results show that the PoE framework provides both a mathematically elegant way to train multiple acoustic models jointly and significant improvements in the quality of the synthesized speech.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:20 ,  Issue: 3 )