A text-to-speech conversion system for Japanese, developed for the purpose of producing high-quality speech output, is presented. It consists of four processing stages: (1) linguistic processing, (2) phonological processing, (3) control parameter generation, and (4) speech waveform generation. An overview of the whole system is presented. The innovations introduced into the second and the fourth stages, i.e. rules for generating prosodic symbols from the linguistic information in the second stage and the configuration of a new type of terminal analog speech synthesizer designed as the fourth stage, are described. The validity of the approach is confirmed by the improvements both in the prosodic and in the segmental qualities of synthesized speech
Published in:
Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on
Date of Conference: 3-6 Apr 1990