Abstract:
Expressive text-to-speech (TTS) synthesis should contribute to the pleasantness, intelligibility, and speed of speech-based human-machine interactions which use TTS. We d...Show MoreMetadata
Abstract:
Expressive text-to-speech (TTS) synthesis should contribute to the pleasantness, intelligibility, and speed of speech-based human-machine interactions which use TTS. We describe a TTS engine which can be directed, via text markup, to use a variety of expressive styles, here, questioning, contrastive emphasis, and conveying good and bad news. Differences in these styles lead us to investigate two approaches for expressive TTS, a "corpus-driven" and a "prosodic-phonology" approach. Each speaker records 11 h (excluding silences) of "neutral" sentences. In the corpus-driven approach, the speaker also records 1-h corpora in each expressive style; these segments are tagged by style for use during search, and decision trees for determining f0 contours and timing are trained separately for each of the neutral and expressive corpora. In the prosodic-phonology approach, rules translating certain expressive markup elements to tones and break indices (ToBI) are manually determined, and the ToBI elements are used in single f0 and duration trees for all expressions. Tests show that listeners identify synthesis in particular styles ranging from 70% correctly for "conveying bad news" to 85% for "yes-no questions". Further improvements are demonstrated through the use of speaker-pooled f0 and duration models
Published in: IEEE Transactions on Audio, Speech, and Language Processing ( Volume: 14, Issue: 4, July 2006)
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Speech synthesis ,
- Humans ,
- Mood ,
- Engines ,
- Decision trees ,
- Timing ,
- Testing ,
- Bandwidth ,
- Information systems ,
- Marketing and sales
- Index Terms
- Decision Tree ,
- General Questions ,
- Good And Bad ,
- Good News ,
- Variety Of Styles ,
- Contralateral ,
- Cost Function ,
- Utterances ,
- Hidden Markov Model ,
- Syllable ,
- Tree Nodes ,
- Basic System ,
- Acoustic Features ,
- Words In Sentences ,
- Front End ,
- Prosodic ,
- Sentence Type ,
- Cost Matrix ,
- Dynamic Compression ,
- Voice Quality ,
- Vector Of Attributes ,
- Speech Synthesis ,
- Synthesis Quality ,
- Standard Analysis Of Variance ,
- Baseline System ,
- Speech Corpus ,
- Smooth Contour ,
- Statistical Models ,
- Context Vector ,
- Experimental System
- Author Keywords
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Speech synthesis ,
- Humans ,
- Mood ,
- Engines ,
- Decision trees ,
- Timing ,
- Testing ,
- Bandwidth ,
- Information systems ,
- Marketing and sales
- Index Terms
- Decision Tree ,
- General Questions ,
- Good And Bad ,
- Good News ,
- Variety Of Styles ,
- Contralateral ,
- Cost Function ,
- Utterances ,
- Hidden Markov Model ,
- Syllable ,
- Tree Nodes ,
- Basic System ,
- Acoustic Features ,
- Words In Sentences ,
- Front End ,
- Prosodic ,
- Sentence Type ,
- Cost Matrix ,
- Dynamic Compression ,
- Voice Quality ,
- Vector Of Attributes ,
- Speech Synthesis ,
- Synthesis Quality ,
- Standard Analysis Of Variance ,
- Baseline System ,
- Speech Corpus ,
- Smooth Contour ,
- Statistical Models ,
- Context Vector ,
- Experimental System
- Author Keywords