Skip to Main Content
This paper improves a minimum generation error (MGE) based HMM training technique for HMM-based speech synthesis by directly using the original spectrum instead of line spectral pairs (LSPs) as reference spectrum for log spectral distortion (LSD) measure. Two types of original reference spectra for LSD calculation are investigated, including the spectrum extracted from speech waveform by STRAIGHT, and the short-time FFT spectrum calculated from speech waveforms. Since only the harmonics of the FFT spectrum are coincident with the underlying spectral envelope, the LSD between generated LSPs and original FFT spectrum is calculated by sampling at the harmonic frequencies, and a weighting function is designed to simulate the sampling strategy on LSPs. From the experimental results, the MGE-LSD training using the FFT spectrum as reference spectrum achieved the best performance.