Skip to Main Content
This paper describes an MLLR-based speaking style adaptation technique for HMM-based speech synthesis. Since speaking styles and emotional expressions are characterized by many suprasegmental features as well as segmental features, it is necessary to adapt suprasegmental features for speaking style adaptation. To achieve suprasegmental feature adaptation, we utilize context clustering decision trees, which are constructed in the training stage, for tying of regression matrices. Using this technique, we adapt an initial "reading" style model to "joyful" or "sad" styles. Experimental results show that, using 50 adaptation sentences, speech samples generated from adapted models were judged to be similar to the target speaking styles at rates of 92% and 70% for joyful and sad styles, respectively.
Date of Conference: 17-21 May 2004