Anchored Monotonic Alignment and Representation Substitution for Rare Spontaneous Behaviors in Spontaneous Speech Synthesis | IEEE Conference Publication | IEEE Xplore

Anchored Monotonic Alignment and Representation Substitution for Rare Spontaneous Behaviors in Spontaneous Speech Synthesis


Abstract:

Spontaneous behaviors in speech pose significant challenges for speech synthesis. Existing research has not adequately addressed these behaviors, with most studies relyin...Show More

Abstract:

Spontaneous behaviors in speech pose significant challenges for speech synthesis. Existing research has not adequately addressed these behaviors, with most studies relying on specially recorded datasets. In contrast, real-world data more accurately reflects the natural, spontaneous speaking styles in everyday life and encompasses a wider range of spontaneous behaviors. However, such data is often of lower quality, and the distribution of spontaneous behaviors is highly imbalanced. In this study, we explore spontaneous speech synthesis using real-world data within the VITS2 framework. To overcome these challenges, we introduce two techniques: anchored monotonic alignment and spontaneous hidden representation substitution. Experimental results demonstrate that these methods enhance model alignment and improve the naturalness of the generated speech. Our proposed approach successfully addresses the challenge of synthesizing rare spontaneous behaviors and offers users flexible control over the synthesized speech.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Funding Agency:


References

References is not available for this document.