Cart (Loading....) | Create Account
Close category search window

Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

4 Author(s)
Zhiyong Wu ; Dept. of Syst. Eng. & Eng. Manage., Chinese Univ. of Hong Kong (CUHK), Hong Kong, China ; Meng, H.M. ; Hongwu Yang ; Lianhong Cai

This work focuses on the development of expressive text-to-speech synthesis techniques for a Chinese spoken dialog system, where the expressivity is driven by the message content. We adapt the three-dimensional pleasure-displeasure, arousal-nonarousal and dominance-submissiveness (PAD) model for describing expressivity in input text semantics. The context of our study is based on response messages generated by a spoken dialog system in the tourist information domain. We use the P (pleasure) and A (arousal) dimensions to describe expressivity at the prosodic word level based on lexical semantics. The D (dominance) dimension is used to describe expressivity at the utterance level based on dialog acts. We analyze contrastive (neutral versus expressive) speech recordings to develop a nonlinear perturbation model that incorporates the PAD values of a response message to transform neutral speech into expressive speech. Two levels of perturbations are implemented-local perturbation at the prosodic word level, as well as global perturbation at the utterance level. Perceptual experiments involving 14 subjects indicate that the proposed approach can significantly enhance expressivity in response generation for a spoken dialog system.

Published in:

Audio, Speech, and Language Processing, IEEE Transactions on  (Volume:17 ,  Issue: 8 )

Date of Publication:

Nov. 2009

Need Help?

IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2014 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.