Loading [a11y]/accessibility-menu.js
Mels-Tts : Multi-Emotion Multi-Lingual Multi-Speaker Text-To-Speech System Via Disentangled Style Tokens | IEEE Conference Publication | IEEE Xplore

Mels-Tts : Multi-Emotion Multi-Lingual Multi-Speaker Text-To-Speech System Via Disentangled Style Tokens


Abstract:

This paper proposes a multi-emotion, multi-lingual, and multi-speaker text-to-speech (MELS-TTS) system, employing disentangled style tokens for effective emotion transfer...Show More

Abstract:

This paper proposes a multi-emotion, multi-lingual, and multi-speaker text-to-speech (MELS-TTS) system, employing disentangled style tokens for effective emotion transfer. In speech encompassing various attributes, such as emotional state, speaker identity, and linguistic style, disentangling these elements is crucial for an efficient multi-emotion, multi-lingual, and multi-speaker TTS system. To accomplish this purpose, we propose to utilize separate style tokens to disentangle emotion, language, speaker, and residual information, inspired by the global style tokens (GSTs). Through the attention mechanism, each style token learns its respective speech attribute from the target speech. Our proposed approach yields improved performance in both objective and subjective evaluations, demonstrating the ability to generate cross-lingual speech with diverse emotions, even from a neutral source speaker, while preserving the speaker’s identity.
Date of Conference: 14-19 April 2024
Date Added to IEEE Xplore: 18 March 2024
ISBN Information:

ISSN Information:

Conference Location: Seoul, Korea, Republic of

Contact IEEE to Subscribe

References

References is not available for this document.