Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

Factored Maximum Penalized Likelihood Kernel Regression for HMM-Based Style-Adaptive Speech Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
June Sig Sung ; Sch. of Electr. Eng. & INMC, Seoul Nat. Univ., Seoul, South Korea ; Doo Hwa Hong ; Nam Soo Kim

Speech synthesized from the same text should sound differently depending on the speaking style. Current speech synthesis techniques based on the hidden Markov model (HMM) usually focus on a fixed speaking style and changing the speaking style requires a variety of sets of parameters trained in different speaking styles. A promising alternative is to adapt the base model to the intended speaking style. In our previous work, we proposed factored maximum likelihood linear regression (FMLLR) adaptation where each MLLR parameter is defined as a function of a control vector. We presented a method to train the FMLLR parameters based on a general framework of the expectation-maximization (EM) algorithm. In this paper, we introduce a novel technique called factored maximum penalized likelihood kernel regression (FMLKR) for HMM-based style adaptive speech synthesis. In FMLKR, nonlinear regression between the mean vector of the base model and the corresponding mean vectors of the adaptation data is performed with the use of kernel method based on the FMLLR framework. In a series of experiments on artificial generation of singing voice and expressive speech, we evaluate the performance of the FMLLR and FMLKR techniques with various matrix structures and also compare with other approaches to parameter adaptation in HMM-based speech synthesis.

Published in:

Selected Topics in Signal Processing, IEEE Journal of  (Volume:8 ,  Issue: 2 )