By Topic

Minimum generation error training by using original spectrum as reference for log spectral distortion measure

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Yi-Jian Wu ; Nagoya Inst. of Technol., Nagoya ; Tokuda, K.

This paper improves a minimum generation error (MGE) based HMM training technique for HMM-based speech synthesis by directly using the original spectrum instead of line spectral pairs (LSPs) as reference spectrum for log spectral distortion (LSD) measure. Two types of original reference spectra for LSD calculation are investigated, including the spectrum extracted from speech waveform by STRAIGHT, and the short-time FFT spectrum calculated from speech waveforms. Since only the harmonics of the FFT spectrum are coincident with the underlying spectral envelope, the LSD between generated LSPs and original FFT spectrum is calculated by sampling at the harmonic frequencies, and a weighting function is designed to simulate the sampling strategy on LSPs. From the experimental results, the MGE-LSD training using the FFT spectrum as reference spectrum achieved the best performance.

Published in:

Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on

Date of Conference:

19-24 April 2009