Skip to Main Content
The paper presents the implementation of a Chinese text to speech (hereafter called TTS) system based on the Time Domain Pitch-Synchronous OverLap-Add approach (hereafter called TD-PSOLA). In order to get natural synthesized speech, it is necessary to precisely extract pitch-marks for each monosyllabic speech unit, to predict the length of syllables in a sentence to be synthesized and to generate F0-contours for their final portion. In the paper, we concentrate on the last two issues to propose a scheme to predict syllable duration. which gives an accuracy of about 18% of the relative length error, and to generate F0-contour. To synthesize a certain tonal syllable with a desired duration, a new pattern-scaling algorithm was proposed. The preliminary hearing test showed the intelligibility and naturalness of synthetic speech were good.