By Topic

A Parameter Generation Algorithm Using Local Variance for HMM-Based Speech Synthesis

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

3 Author(s)
Nose, T. ; Dept. of Inf. Process., Tokyo Inst. of Technol., Yokohama, Japan ; Chunwijitra, V. ; Kobayashi, T.

This paper proposes a parameter generation algorithm using a local variance (LV) model in HMM-based speech synthesis. In the proposed technique, we define the LV as a feature that represents the local variation of a spectral parameter sequence and model LVs using HMMs. Context-dependent HMMs are used to capture the dependence of LV trajectories on phonetic and prosodic contexts. In addition, the dynamic features of LVs are taken into account as well as the static one to appropriately model the dynamic characteristics of LV trajectories. By introducing the LV model into the spectral parameter generation process, the proposed technique can impose a more precise variance constraint for each frame than the conventional technique with a global variance (GV) model. Consequently, the proposed technique alleviates the excessive spectral peak enhancement that often occurs in GV-based parameter generation. Objective evaluation results show that the proposed technique can generate better spectral parameter trajectories than the GV-based technique in terms of spectral and LV distortion. Moreover, the results of subjective evaluation demonstrate that the proposed technique can generate synthetic speech significantly closer to the original one than the conventional technique while maintaining speech naturalness.

Published in:

Selected Topics in Signal Processing, IEEE Journal of  (Volume:8 ,  Issue: 2 )